Adversarial Attacks on Embeddings: Poisoning Vector Stores and Manipulating Semantic Search
Problem
Embedding-based retrieval powers RAG pipelines, semantic search, recommendation systems, and classification. The embedding space is treated as a trustworthy representation of meaning, but it is not. An attacker who can inject documents into the indexing pipeline controls what gets retrieved. An attacker who understands the embedding model can craft adversarial documents that sit close to target queries in vector space while containing arbitrary content.
Most teams validate the text content of documents before indexing but never inspect the resulting embeddings. A document that passes content moderation can still be adversarial in embedding space: semantically close to high-value queries, positioned to displace legitimate results, and carrying payloads (prompt injections, misinformation, or data exfiltration instructions) that the retrieval system faithfully surfaces.
The attack surface spans three areas: poisoning the vector store through the ingestion pipeline, crafting queries that manipulate retrieval results, and exploiting the geometry of the embedding space to create adversarial collisions.
Threat Model
- Adversary: (1) Attacker with write access to the document ingestion pipeline (employee, compromised service account, supply chain). (2) Attacker who understands the embedding model architecture and can craft adversarial inputs offline. (3) Attacker who can submit queries to a RAG endpoint and observe retrieved content.
- Objective: Inject documents that get retrieved for high-value queries (poisoning). Displace legitimate documents from retrieval results (denial of service). Embed prompt injection payloads in documents that will be passed to an LLM. Extract information about the embedding model or indexed documents through query probing.
- Blast radius: Poisoned retrievals lead to incorrect LLM outputs. In agentic systems, this can trigger unauthorized actions. In customer-facing systems, it causes misinformation or brand damage.
Configuration
Embedding Integrity Validation at Ingestion
# embedding_validator.py - validate embeddings before they enter the vector store
import numpy as np
from typing import Tuple, List
from dataclasses import dataclass
@dataclass
class EmbeddingValidationResult:
valid: bool
reasons: List[str]
embedding_norm: float
nearest_cluster_distance: float
class EmbeddingValidator:
"""
Validate embeddings before insertion into the vector store.
Detects anomalous embeddings that may indicate adversarial crafting.
"""
def __init__(self, expected_dim: int = 1536, norm_range: Tuple[float, float] = (0.9, 1.1)):
self.expected_dim = expected_dim
self.norm_range = norm_range
self.cluster_centroids = None # loaded from baseline
self.max_cluster_distance = None
def load_baseline(self, centroids: np.ndarray, threshold_percentile_99: float):
"""Load cluster centroids from a baseline computed over known-good documents."""
self.cluster_centroids = centroids
self.max_cluster_distance = threshold_percentile_99
def validate(self, embedding: np.ndarray, document_text: str) -> EmbeddingValidationResult:
reasons = []
norm = float(np.linalg.norm(embedding))
# Check dimensionality
if embedding.shape[0] != self.expected_dim:
reasons.append(f"dimension_mismatch: expected {self.expected_dim}, got {embedding.shape[0]}")
# Check norm (most embedding models produce near-unit-norm vectors)
if not (self.norm_range[0] <= norm <= self.norm_range[1]):
reasons.append(f"abnormal_norm: {norm:.4f} outside [{self.norm_range[0]}, {self.norm_range[1]}]")
# Check for NaN or Inf
if np.any(np.isnan(embedding)) or np.any(np.isinf(embedding)):
reasons.append("contains_nan_or_inf")
# Check distance to nearest known cluster
nearest_distance = float("inf")
if self.cluster_centroids is not None:
distances = np.linalg.norm(self.cluster_centroids - embedding, axis=1)
nearest_distance = float(np.min(distances))
if nearest_distance > self.max_cluster_distance:
reasons.append(
f"outlier_embedding: distance {nearest_distance:.4f} "
f"exceeds threshold {self.max_cluster_distance:.4f}"
)
# Check text-embedding coherence (length ratio heuristic)
word_count = len(document_text.split())
if word_count < 5:
reasons.append("suspiciously_short_document")
return EmbeddingValidationResult(
valid=len(reasons) == 0,
reasons=reasons,
embedding_norm=norm,
nearest_cluster_distance=nearest_distance,
)
Semantic Drift Detection
# prometheus-embedding-drift.yaml
# Monitor for sudden changes in the distribution of newly indexed embeddings
groups:
- name: embedding-drift
interval: 5m
rules:
# Track average cosine similarity of new embeddings to their nearest cluster centroid
- record: embedding:avg_cluster_distance:5m
expr: >
avg(embedding_nearest_cluster_distance_bucket) by (index_name)
# Alert when new embeddings are systematically further from known clusters
- alert: EmbeddingSemanticDrift
expr: >
embedding:avg_cluster_distance:5m > 0.45
and
rate(embedding_documents_indexed_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Semantic drift detected in {{ $labels.index_name }}"
description: >
Average distance of new embeddings to cluster centroids has increased
to {{ $value | humanize }}. This may indicate adversarial document
injection or a significant shift in ingested content.
# Alert on sudden spike in embedding validation failures
- alert: EmbeddingValidationFailureSpike
expr: >
rate(embedding_validation_failures_total[5m])
/ rate(embedding_documents_indexed_total[5m]) > 0.1
for: 10m
labels:
severity: critical
annotations:
summary: "{{ $value | humanizePercentage }} of new embeddings failing validation"
Input Sanitisation Before Embedding
# pre_embedding_sanitizer.py - clean documents before they reach the embedding model
import re
import hashlib
from typing import Optional
class PreEmbeddingSanitizer:
"""
Sanitize document content before embedding.
Prevents adversarial text patterns that manipulate embedding geometry.
"""
# Patterns that attempt to manipulate embedding space positioning
ADVERSARIAL_PATTERNS = [
# Repeated keyword stuffing (inflates relevance for specific queries)
(r"(\b\w+\b)(\s+\1){10,}", "keyword_stuffing"),
# Invisible unicode characters used to shift embeddings
(r"[\u200b\u200c\u200d\ufeff\u00ad]{3,}", "invisible_unicode"),
# Base64-encoded payloads hidden in documents
(r"[A-Za-z0-9+/]{100,}={0,2}", "encoded_payload"),
# Homoglyph substitution (mixing Latin and Cyrillic)
(r"[\u0400-\u04ff].*[\x41-\x5a\x61-\x7a]", "homoglyph_mixing"),
]
def __init__(self):
self.seen_hashes = set()
def sanitize(self, text: str) -> tuple[str, list[str]]:
warnings = []
# Check for near-duplicate content (adversarial document flooding)
content_hash = hashlib.sha256(text.strip().lower().encode()).hexdigest()
if content_hash in self.seen_hashes:
warnings.append("duplicate_document")
self.seen_hashes.add(content_hash)
# Check for adversarial patterns
for pattern, category in self.ADVERSARIAL_PATTERNS:
if re.search(pattern, text):
warnings.append(f"adversarial_pattern:{category}")
# Strip invisible unicode
cleaned = re.sub(r"[\u200b\u200c\u200d\ufeff\u00ad]", "", text)
# Normalise whitespace
cleaned = re.sub(r"\s+", " ", cleaned).strip()
return cleaned, warnings
Kubernetes Deployment for Embedding Validation Service
# embedding-validator-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: embedding-validator
namespace: ai-pipeline
spec:
replicas: 2
selector:
matchLabels:
app: embedding-validator
template:
metadata:
labels:
app: embedding-validator
spec:
containers:
- name: validator
image: internal-registry/embedding-validator:1.4.0
ports:
- containerPort: 8080
env:
- name: BASELINE_PATH
value: "/data/baselines/cluster_centroids.npy"
- name: MAX_CLUSTER_DISTANCE
value: "0.55"
- name: EMBEDDING_DIM
value: "1536"
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: baseline-data
mountPath: /data/baselines
readOnly: true
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 10
volumes:
- name: baseline-data
persistentVolumeClaim:
claimName: embedding-baselines-pvc
---
apiVersion: v1
kind: Service
metadata:
name: embedding-validator
namespace: ai-pipeline
spec:
selector:
app: embedding-validator
ports:
- port: 8080
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: embedding-validator-policy
namespace: ai-pipeline
spec:
podSelector:
matchLabels:
app: embedding-validator
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: document-ingestion
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: vector-store
ports:
- port: 6333
Expected Behaviour
- All documents pass through the sanitizer before embedding generation
- Embedding validation rejects vectors with anomalous norms, dimensions, or cluster distances
- Duplicate documents are detected and flagged before indexing
- Semantic drift alerts fire within 15 minutes of sustained anomalous ingestion
- Validation failure rate above 10% triggers a critical alert and pauses ingestion
- Adversarial text patterns (keyword stuffing, invisible unicode, homoglyphs) are stripped or flagged before embedding
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| Cluster distance threshold | Rejects embeddings far from known topics | Legitimate new topics get flagged as anomalous | Retrain baselines monthly. Allow manual override for new document categories with approval. |
| Duplicate detection | Prevents flooding attacks | Legitimate re-indexing of updated documents is blocked | Use content-hash with version tracking. Allow updates that change >20% of content. |
| Pre-embedding sanitisation | Removes adversarial text patterns | Overly aggressive cleaning may alter document meaning | Log all modifications. Allow human review of sanitised documents before final indexing. |
| Embedding validation latency | Adds 10-50ms per document at ingestion time | Slows bulk indexing operations | Run validation asynchronously for batch ingestion. Synchronous for real-time. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Baseline too old | New legitimate documents consistently rejected | Validation failure rate climbs steadily over weeks | Rebuild cluster centroids from current corpus. Schedule monthly baseline refresh. |
| Adversarial bypass | Attacker crafts document that passes all checks but poisons retrieval | Retrieved results degrade in quality; user reports inaccurate answers | Manual review of recently indexed documents. Add the bypass technique to the pattern list. |
| Validator service down | Documents indexed without validation | Health check failures; gap in validation metrics | Ingestion pipeline should block (not skip) when validator is unreachable. Queue documents for later validation. |
| False positive on legitimate content | Good documents rejected at ingestion | Data team reports missing documents; ingestion rejection rate spikes | Review and widen cluster distance threshold. Add the document category to the baseline. |
When to Consider a Managed Alternative
Embedding security requires maintaining baselines, updating adversarial pattern detection, and monitoring drift across potentially millions of vectors. This operational load scales with corpus size.
- Pinecone (#147): Managed vector database with built-in metadata filtering and access control. Monitoring dashboards for index health.
- Weaviate (#148): Self-hosted or cloud-managed vector database with OIDC authentication, multi-tenancy, and RBAC for collections.
- Qdrant (#149): High-performance vector database with payload filtering, snapshots for rollback, and access control.
Premium content pack: Embedding security configuration pack. Baseline computation scripts, embedding validation service (Python), Prometheus alerting rules for semantic drift, pre-embedding sanitisation library, and adversarial embedding detection test suite.