AI Model Cards in Production: Documenting Capabilities, Limitations, and Security Properties

Problem

Every production AI model has boundaries: input domains where it performs well, edge cases where it fails, and security properties that constrain how it should be deployed. These boundaries exist whether or not anyone documents them. Undocumented boundaries become production incidents.

A model card is a structured document that travels with the model artifact. It declares what the model can do, what it cannot do, and what it should never be asked to do. The concept originated in a 2019 Google research paper, but most implementations remain academic. They describe models for research publication, not for production deployment.

Production model cards must answer questions that research model cards ignore. Can this model be exposed to untrusted input? What happens when input distribution shifts? Has the model been tested for adversarial robustness? What data was used for training, and does that create legal exposure? Can the model produce outputs that violate content policies?

Without machine-readable model cards enforced at deployment time, teams deploy models they do not fully understand into environments where failures cause real harm. The model that scores 94% accuracy on the test set may score 60% on the population it actually encounters in production, and nobody discovers this until a customer complains.

Threat Model

Adversary: Operational risk from deploying models outside their validated boundaries. Attackers who exploit undocumented model weaknesses.
Key requirements: (1) Every model in production has a machine-readable model card. (2) Deployment pipelines validate that the deployment context matches the model’s documented capabilities. (3) Security properties are explicit and enforceable.
Failure scenario: A model trained on English-language financial documents is deployed to process multilingual input. Performance degrades silently. Downstream decisions based on low-confidence outputs cause financial losses before anyone notices.

Configuration

Model Card Schema

Define a schema that is both human-readable and machine-parseable. YAML works well because it lives in version control alongside the model code.

# model-card.yaml
# This file is stored in the model's artifact registry alongside the model weights.
# It is validated at build time and checked at deployment time.

schema_version: "1.0"
model_id: "fraud-detection-v4"
model_version: "4.1.2"
model_hash: "sha256:a3f2c1d8e9b0..."

# Section 1: Model Details
model_details:
  name: "Transaction Fraud Detection Model"
  architecture: "transformer_encoder"
  framework: "PyTorch 2.3"
  task: "binary_classification"
  created_date: "2026-03-15"
  created_by: "ml-platform-team"
  license: "proprietary"
  contact: "ml-platform@company.com"

# Section 2: Training Data Provenance
training_data:
  sources:
    - name: "internal_transaction_history"
      records: 18000000
      date_range: "2022-01-01 to 2025-12-31"
      geographic_scope: ["US", "EU", "UK"]
      pii_handling: "tokenised_before_training"
      consent_basis: "legitimate_interest"
    - name: "synthetic_fraud_samples"
      records: 500000
      generation_method: "rule_based_augmentation"
      purpose: "address_class_imbalance"
  preprocessing:
    - "currency_normalisation_to_usd"
    - "transaction_amount_log_scaling"
    - "merchant_category_encoding"
  excluded_features:
    - "customer_name"
    - "customer_address"
    - "customer_ethnicity"
    - "customer_gender"

# Section 3: Intended Use
intended_use:
  primary_use: "Flag potentially fraudulent card transactions for human review"
  intended_users: ["fraud_operations_team"]
  deployment_context:
    - "real_time_transaction_scoring"
    - "batch_retroactive_analysis"
  out_of_scope_uses:
    - "autonomous_transaction_blocking_without_human_review"
    - "customer_creditworthiness_assessment"
    - "law_enforcement_investigation"

# Section 4: Performance
performance:
  primary_metric: "f1_score"
  evaluation_datasets:
    - name: "holdout_test_set"
      records: 2000000
      metrics:
        accuracy: 0.967
        precision: 0.91
        recall: 0.94
        f1_score: 0.925
        auc_roc: 0.98
    - name: "production_shadow_30d"
      records: 45000000
      metrics:
        accuracy: 0.959
        precision: 0.87
        recall: 0.92
        f1_score: 0.894
        auc_roc: 0.97
  performance_boundaries:
    - condition: "transaction_amount_below_5_usd"
      impact: "precision drops to 0.72 due to limited training samples"
    - condition: "cryptocurrency_merchant_category"
      impact: "recall drops to 0.68; category underrepresented in training data"
    - condition: "non_usd_eur_gbp_currencies"
      impact: "f1 drops to 0.81; currency normalisation introduces noise"

# Section 5: Fairness
fairness:
  tested_dimensions:
    - dimension: "geographic_region"
      metric: "equalised_odds_difference"
      result: 0.04
      threshold: 0.05
      status: "pass"
    - dimension: "transaction_amount_quartile"
      metric: "demographic_parity_difference"
      result: 0.08
      threshold: 0.05
      status: "fail_monitored"
      mitigation: "additional training data collection for Q1 transactions in progress"

# Section 6: Known Failure Modes
failure_modes:
  - name: "novel_fraud_pattern"
    description: "Model cannot detect fraud patterns not present in training data"
    likelihood: "medium"
    impact: "high"
    mitigation: "quarterly retraining; human review of low-confidence scores"
  - name: "adversarial_transaction_structuring"
    description: "Attackers split transactions to stay below detection thresholds"
    likelihood: "high"
    impact: "medium"
    mitigation: "session-level aggregation model runs in parallel"
  - name: "data_pipeline_schema_drift"
    description: "Upstream data format changes cause silent input corruption"
    likelihood: "low"
    impact: "critical"
    mitigation: "Great Expectations validation on every input batch"

# Section 7: Security Properties
security:
  adversarial_robustness:
    tested: true
    method: "FGSM and PGD perturbations on numerical features"
    result: "accuracy degrades less than 3% under l-inf perturbation of 0.1"
  input_validation:
    schema_enforced: true
    max_input_size: "4KB"
    allowed_types: ["float64", "int64", "categorical"]
    injection_protection: "input features are numerical; no free-text fields"
  model_artifact_integrity:
    signing: "cosign"
    signature_verification: "required_at_deployment"
    artifact_registry: "internal_oci_registry"
  data_exfiltration_risk: "low"
  prompt_injection_applicable: false

Model Card Validation in CI/CD

# validate_model_card.py
# Runs in CI/CD to ensure every model has a complete, valid model card.

import yaml
import sys
import json

REQUIRED_SECTIONS = [
    "schema_version", "model_id", "model_version", "model_hash",
    "model_details", "training_data", "intended_use",
    "performance", "failure_modes", "security"
]

REQUIRED_SECURITY_FIELDS = [
    "adversarial_robustness", "input_validation",
    "model_artifact_integrity"
]

REQUIRED_PERFORMANCE_FIELDS = [
    "primary_metric", "evaluation_datasets", "performance_boundaries"
]

def validate(card_path: str) -> list:
    """Validate a model card for production readiness."""
    with open(card_path) as f:
        card = yaml.safe_load(f)

    errors = []

    # Check required top-level sections
    for section in REQUIRED_SECTIONS:
        if section not in card:
            errors.append(f"Missing required section: {section}")

    # Check security section completeness
    if "security" in card:
        for field in REQUIRED_SECURITY_FIELDS:
            if field not in card["security"]:
                errors.append(f"Missing security field: {field}")
        if card["security"].get("adversarial_robustness", {}).get("tested") is False:
            errors.append("Adversarial robustness testing has not been performed")

    # Check performance section
    if "performance" in card:
        for field in REQUIRED_PERFORMANCE_FIELDS:
            if field not in card["performance"]:
                errors.append(f"Missing performance field: {field}")
        if not card["performance"].get("performance_boundaries"):
            errors.append("No performance boundaries documented")

    # Check training data provenance
    if "training_data" in card:
        sources = card["training_data"].get("sources", [])
        if not sources:
            errors.append("No training data sources documented")
        for source in sources:
            if "pii_handling" not in source and "generation_method" not in source:
                errors.append(f"Training data source '{source.get('name')}' missing pii_handling")

    # Check failure modes
    if "failure_modes" in card:
        for mode in card["failure_modes"]:
            if "mitigation" not in mode:
                errors.append(f"Failure mode '{mode.get('name')}' has no mitigation documented")

    return errors

if __name__ == "__main__":
    card_path = sys.argv[1]
    errors = validate(card_path)

    if errors:
        print(f"Model card validation FAILED with {len(errors)} errors:")
        for error in errors:
            print(f"  - {error}")
        sys.exit(1)
    else:
        print("Model card validation PASSED")

Deployment-Time Boundary Check

The model card is not just documentation. It enforces boundaries at deployment time.

# deployment_boundary_check.py
# Runs as a Kubernetes admission webhook or pre-deployment hook.
# Compares the deployment context against the model card's intended use.

import yaml
import sys

def check_deployment_boundaries(card_path: str, deployment_config: dict) -> list:
    """Verify that a deployment context matches model card boundaries."""
    with open(card_path) as f:
        card = yaml.safe_load(f)

    violations = []
    intended = card.get("intended_use", {})

    # Check deployment context is within scope
    allowed_contexts = intended.get("deployment_context", [])
    requested_context = deployment_config.get("context")
    if requested_context and requested_context not in allowed_contexts:
        violations.append(
            f"Deployment context '{requested_context}' not in allowed contexts: {allowed_contexts}"
        )

    # Check for out-of-scope uses
    out_of_scope = intended.get("out_of_scope_uses", [])
    declared_use = deployment_config.get("use_case")
    if declared_use in out_of_scope:
        violations.append(
            f"Use case '{declared_use}' is explicitly out of scope for this model"
        )

    # Check model artifact integrity
    security = card.get("security", {})
    integrity = security.get("model_artifact_integrity", {})
    if integrity.get("signature_verification") == "required_at_deployment":
        if not deployment_config.get("signature_verified"):
            violations.append("Model artifact signature not verified")

    # Check geographic scope if applicable
    training_sources = card.get("training_data", {}).get("sources", [])
    deployment_region = deployment_config.get("region")
    if deployment_region:
        all_scopes = set()
        for source in training_sources:
            scope = source.get("geographic_scope", [])
            if isinstance(scope, list):
                all_scopes.update(scope)
            else:
                all_scopes.add(scope)
        if all_scopes and deployment_region not in all_scopes:
            violations.append(
                f"Deployment region '{deployment_region}' outside training data scope: {all_scopes}"
            )

    return violations

if __name__ == "__main__":
    card_path = sys.argv[1]
    deployment = {
        "context": sys.argv[2] if len(sys.argv) > 2 else None,
        "use_case": sys.argv[3] if len(sys.argv) > 3 else None,
        "region": sys.argv[4] if len(sys.argv) > 4 else None,
        "signature_verified": True
    }
    violations = check_deployment_boundaries(card_path, deployment)
    if violations:
        print("DEPLOYMENT BLOCKED - boundary violations:")
        for v in violations:
            print(f"  - {v}")
        sys.exit(1)
    print("Deployment boundaries check PASSED")

Performance Boundary Monitoring

After deployment, continuously monitor whether the model is operating within its documented boundaries.

# prometheus-rules-model-boundaries.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: model-boundary-alerts
  namespace: monitoring
spec:
  groups:
    - name: model-card-boundaries
      interval: 60s
      rules:
        # Alert when model serves input outside documented boundaries
        - alert: ModelInputOutsideBoundary
          expr: |
            sum(rate(model_input_boundary_violation_total[5m])) by (model_id, boundary_name) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Model {{ $labels.model_id }} receiving input outside documented boundary: {{ $labels.boundary_name }}"
            runbook: "Check model card performance_boundaries section. Evaluate if model is safe for this input domain."

        # Alert when model performance drifts below documented metrics
        - alert: ModelPerformanceBelowCard
          expr: |
            model_live_f1_score < on(model_id) model_card_documented_f1_score * 0.95
          for: 15m
          labels:
            severity: critical
          annotations:
            summary: "Model {{ $labels.model_id }} live F1 score is below 95% of documented model card value"
            runbook: "Investigate input distribution shift. Compare live data profile against training data profile."

        # Alert when confidence distribution shifts
        - alert: ModelConfidenceDistributionShift
          expr: |
            histogram_quantile(0.5, rate(model_output_confidence_bucket[1h])) < 0.7
          for: 30m
          labels:
            severity: warning
          annotations:
            summary: "Median model confidence dropped below 0.7, indicating potential distribution shift"

Model Card Registry API

# model_card_registry.py
# Simple registry that stores and serves model cards.
# Integrates with artifact registries (OCI, MLflow).

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import yaml
import hashlib
from pathlib import Path

app = FastAPI(title="Model Card Registry")
CARDS_DIR = Path("/data/model-cards")

@app.get("/cards/{model_id}/{model_version}")
def get_card(model_id: str, model_version: str):
    """Retrieve a model card by model ID and version."""
    card_path = CARDS_DIR / model_id / f"{model_version}.yaml"
    if not card_path.exists():
        raise HTTPException(status_code=404, detail="Model card not found")
    with open(card_path) as f:
        return yaml.safe_load(f)

@app.get("/cards/{model_id}/{model_version}/security")
def get_security_properties(model_id: str, model_version: str):
    """Return only the security section for quick deployment checks."""
    card = get_card(model_id, model_version)
    return card.get("security", {})

@app.get("/cards/{model_id}/{model_version}/boundaries")
def get_boundaries(model_id: str, model_version: str):
    """Return performance boundaries and intended use for deployment validation."""
    card = get_card(model_id, model_version)
    return {
        "intended_use": card.get("intended_use", {}),
        "performance_boundaries": card.get("performance", {}).get("performance_boundaries", []),
        "failure_modes": card.get("failure_modes", [])
    }

Expected Behaviour

Every model artifact in the registry has an accompanying model card validated against the schema
CI/CD pipeline rejects model artifacts with missing or incomplete model cards
Deployment pipeline validates that the target deployment context matches the model card’s intended use
Models deployed outside their documented geographic scope or use case are blocked automatically
Performance monitoring alerts fire when live metrics drift below model card documented values
Security properties (adversarial robustness, input validation, artifact signing) are verifiable at any time
Model cards are versioned alongside model code and artifacts

Trade-offs

Control	Impact	Risk	Mitigation
Mandatory model cards in CI/CD	Every model is documented before deployment	ML engineers spend 30-60 minutes per model version writing and updating cards	Provide templates. Auto-populate fields from training metadata where possible.
Deployment boundary checking	Prevents models from serving outside their validated domain	False positives block legitimate deployments when use cases evolve	Allow boundary overrides with explicit sign-off and documented justification.
Performance boundary monitoring	Catches distribution shift before it causes harm	Alert fatigue if boundaries are set too tightly	Set boundaries at 95% of documented metrics. Tune per-model based on operational experience.
Model card registry API	Centralised, queryable model documentation	Another service to maintain and keep available	Deploy as a lightweight FastAPI service. Back with a file system or object storage. No database required.

Failure Modes

Failure	Symptom	Detection	Recovery
Model card does not match actual model behaviour	Card says accuracy is 0.96, live accuracy is 0.85	Performance monitoring detects drift from card values	Investigate. Update card if model was retrained. Retrain if model degraded.
Deployment boundary check has false positive	Legitimate deployment blocked	Engineering team escalates blocked deployment	Review boundary definitions. Add the new context to intended_use if validated.
Model card schema evolves but old cards not migrated	Old models missing newly required fields	Validation pipeline fails for old model versions	Schema migration script updates existing cards. Add defaults for new required fields.
Training data provenance is incomplete	Card lists sources but not PII handling or consent basis	Model card validation catches missing fields	Work with data governance team to document provenance retroactively.

When to Consider a Managed Alternative

Model card management becomes complex when organisations operate dozens of models across multiple teams and deployment environments.

Vanta (#169): Integrates model card documentation into broader compliance workflows. Tracks which models have complete documentation and flags gaps.
Grafana Cloud (#108): Dashboards that overlay model card documented metrics against live performance metrics. Visual boundary monitoring across all models.
Axiom (#112): Store and query model card change history. Track which card version was active when an incident occurred.

Premium content pack: Model card templates pack. Complete YAML schemas for classification, NLP, computer vision, and generative models. CI/CD validation scripts (Python), deployment boundary checker, Prometheus alert rules for performance boundary monitoring, and FastAPI model card registry with OCI artifact integration.