Building an AI Governance Pipeline: Automated Checks from Training to Production

Building an AI Governance Pipeline: Automated Checks from Training to Production

Problem

AI governance in most organisations is a manual process. A model is trained, someone writes a document, a committee meets, approvals are collected via email or Slack, and the model ships. The governance artifacts (if they exist) are disconnected from the model artifacts. Nobody can tell you which governance checks were applied to the model currently serving traffic in production.

This disconnect creates two problems. First, governance becomes a bottleneck. ML engineers wait days or weeks for approvals while models that could be improving production systems sit in staging. Second, governance becomes theatrical. Documents are written to satisfy a process, not to catch real issues. The bias test results in the approval document may not correspond to the model version being deployed.

Governance-as-code solves both problems. Governance checks become automated pipeline stages that run alongside training, evaluation, and deployment. Every model artifact in the registry has a machine-verifiable governance record. The governance record is cryptographically linked to the model artifact it describes. If the model changes, the governance checks re-run automatically.

This approach is not about removing humans from governance decisions. It is about ensuring that humans make decisions based on accurate, current information, and that the evidence supporting those decisions is permanently linked to the model.

Threat Model

  • Adversary: Ungoverned models reaching production. Stale governance artifacts that do not match the deployed model. Governance processes that slow delivery without improving safety.
  • Key requirements: (1) Every model in production has a verifiable governance record. (2) Governance checks run automatically on every model version. (3) Human approval is required for high-risk models, with the approval cryptographically linked to the model artifact. (4) Governance status is visible in real time.
  • Failure scenario: A model passes governance review for version 3.1. An ML engineer retrains with new data and deploys version 3.2 without re-running governance checks. Version 3.2 has a bias issue that version 3.1 did not, but the governance record shows “approved” because it references version 3.1.

Configuration

Governance Pipeline Architecture

The governance pipeline runs as a set of stages in the ML CI/CD pipeline. Each stage produces a signed attestation that is stored alongside the model artifact.

# governance-pipeline.yaml
# Defines the governance stages that every model must pass.
# Runs as part of the ML CI/CD pipeline (GitHub Actions, GitLab CI, Argo Workflows).

stages:
  - name: "data_provenance"
    description: "Verify training data sources, lineage, and consent"
    required: true
    checks:
      - "training_data_sources_documented"
      - "pii_handling_verified"
      - "consent_basis_documented"
      - "data_retention_policy_compliant"
    blocking: true

  - name: "model_card"
    description: "Validate model card completeness and accuracy"
    required: true
    checks:
      - "model_card_schema_valid"
      - "performance_metrics_populated"
      - "failure_modes_documented"
      - "security_properties_documented"
      - "intended_use_defined"
    blocking: true

  - name: "bias_and_fairness"
    description: "Run automated bias testing across protected attributes"
    required: true
    checks:
      - "demographic_parity_within_threshold"
      - "equalised_odds_within_threshold"
      - "disparate_impact_ratio_above_minimum"
    blocking: true

  - name: "safety_evaluation"
    description: "Test for adversarial robustness and edge case handling"
    required: true
    checks:
      - "adversarial_perturbation_test_passed"
      - "edge_case_inputs_handled"
      - "output_boundary_verified"
    blocking: true

  - name: "security_review"
    description: "Verify model artifact integrity and deployment security"
    required: true
    checks:
      - "model_artifact_signed"
      - "inference_endpoint_tls_configured"
      - "input_validation_configured"
      - "rate_limiting_configured"
    blocking: true

  - name: "risk_classification"
    description: "Classify model by regulatory risk tier"
    required: true
    checks:
      - "risk_tier_assigned"
      - "required_controls_for_tier_present"
    blocking: true

  - name: "human_approval"
    description: "Human review and sign-off for high-risk models"
    required_for_risk_tiers: ["high_risk"]
    approval_roles: ["ml_lead", "compliance_officer"]
    blocking: true

Governance-as-Code Implementation

# governance_runner.py
# Executes governance checks and produces signed attestations.

import hashlib
import json
import subprocess
import sys
import yaml
from datetime import datetime, timezone
from dataclasses import dataclass, field, asdict

@dataclass
class GovernanceCheck:
    name: str
    stage: str
    passed: bool
    details: str
    timestamp: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())

@dataclass
class GovernanceRecord:
    model_id: str
    model_version: str
    model_hash: str
    pipeline_run_id: str
    checks: list = field(default_factory=list)
    human_approvals: list = field(default_factory=list)

    @property
    def all_passed(self):
        return all(c.passed for c in self.checks)

    @property
    def record_hash(self):
        content = json.dumps(asdict(self), sort_keys=True)
        return hashlib.sha256(content.encode()).hexdigest()

def run_governance_pipeline(
    model_id: str,
    model_version: str,
    model_artifact_path: str,
    pipeline_config_path: str = "governance-pipeline.yaml"
) -> GovernanceRecord:
    """Execute the full governance pipeline for a model."""

    # Compute model artifact hash
    with open(model_artifact_path, "rb") as f:
        model_hash = hashlib.sha256(f.read()).hexdigest()

    with open(pipeline_config_path) as f:
        config = yaml.safe_load(f)

    record = GovernanceRecord(
        model_id=model_id,
        model_version=model_version,
        model_hash=model_hash,
        pipeline_run_id=f"gov-{datetime.now(timezone.utc).strftime('%Y%m%d%H%M%S')}"
    )

    for stage in config["stages"]:
        stage_name = stage["name"]
        print(f"\n--- Running governance stage: {stage_name} ---")

        for check_name in stage.get("checks", []):
            result = _execute_check(check_name, model_id, model_version)
            record.checks.append(GovernanceCheck(
                name=check_name,
                stage=stage_name,
                passed=result["passed"],
                details=result["details"]
            ))

            status = "PASS" if result["passed"] else "FAIL"
            print(f"  [{status}] {check_name}: {result['details']}")

            if not result["passed"] and stage.get("blocking", False):
                print(f"\nBLOCKING FAILURE in stage '{stage_name}'. Pipeline halted.")
                return record

    return record

def _execute_check(check_name: str, model_id: str, model_version: str) -> dict:
    """Execute a single governance check. Returns {passed: bool, details: str}."""
    check_script = f"checks/{check_name}.py"
    try:
        result = subprocess.run(
            ["python", check_script, model_id, model_version],
            capture_output=True, text=True, timeout=300
        )
        output = json.loads(result.stdout) if result.stdout else {}
        return {
            "passed": result.returncode == 0,
            "details": output.get("details", result.stderr or "No details")
        }
    except subprocess.TimeoutExpired:
        return {"passed": False, "details": "Check timed out after 300 seconds"}
    except Exception as e:
        return {"passed": False, "details": f"Check execution error: {str(e)}"}

def sign_governance_record(record: GovernanceRecord, key_path: str) -> str:
    """Sign the governance record with cosign for tamper detection."""
    record_json = json.dumps(asdict(record), sort_keys=True)
    record_path = f"/tmp/governance-{record.pipeline_run_id}.json"

    with open(record_path, "w") as f:
        f.write(record_json)

    # Sign with cosign
    subprocess.run([
        "cosign", "sign-blob",
        "--key", key_path,
        "--output-signature", f"{record_path}.sig",
        record_path
    ], check=True)

    return record_path

CI/CD Integration

# .github/workflows/ml-governance-pipeline.yaml
name: ML Governance Pipeline

on:
  push:
    paths:
      - 'models/**'
      - 'training/**'

jobs:
  governance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install -r requirements-governance.txt

      - name: Compute model artifact hash
        id: model_hash
        run: |
          HASH=$(sha256sum models/fraud-detection/model.onnx | cut -d' ' -f1)
          echo "hash=$HASH" >> "$GITHUB_OUTPUT"

      - name: Stage 1 - Data Provenance
        run: |
          python checks/training_data_sources_documented.py fraud-detection-v4 4.1.2
          python checks/pii_handling_verified.py fraud-detection-v4 4.1.2
          python checks/consent_basis_documented.py fraud-detection-v4 4.1.2

      - name: Stage 2 - Model Card Validation
        run: |
          python validate_model_card.py models/fraud-detection/model-card.yaml

      - name: Stage 3 - Bias and Fairness Testing
        run: |
          python run_bias_tests.py \
            bias-testing-config.yaml \
            models/fraud-detection/eval-predictions.parquet

      - name: Stage 4 - Safety Evaluation
        run: |
          python safety_evaluations.py \
            --model models/fraud-detection/model.onnx \
            --config safety-eval-config.yaml

      - name: Stage 5 - Security Review
        run: |
          # Verify model artifact is signed
          cosign verify-blob \
            --key cosign.pub \
            --signature models/fraud-detection/model.onnx.sig \
            models/fraud-detection/model.onnx

          # Validate deployment security config
          python checks/deployment_security.py fraud-detection-v4 4.1.2

      - name: Stage 6 - Risk Classification
        id: risk
        run: |
          RESULT=$(python classify_ai_system.py "$(cat models/fraud-detection/metadata.json)")
          echo "risk_tier=$(echo $RESULT | jq -r '.risk_tier')" >> "$GITHUB_OUTPUT"
          echo "$RESULT"

      - name: Stage 7 - Human Approval Gate (high-risk only)
        if: steps.risk.outputs.risk_tier == 'high_risk'
        uses: trstringer/manual-approval@v1
        with:
          secret: ${{ secrets.GITHUB_TOKEN }}
          approvers: ml-leads,compliance-officers
          minimum-approvals: 2
          issue-title: "Governance approval required: fraud-detection-v4 v4.1.2"
          issue-body: |
            Model: fraud-detection-v4 v4.1.2
            Risk tier: high_risk
            Model hash: ${{ steps.model_hash.outputs.hash }}
            All automated governance checks passed.
            Please review the governance report and approve.

      - name: Generate Governance Record
        run: |
          python governance_runner.py \
            --model-id fraud-detection-v4 \
            --model-version 4.1.2 \
            --artifact models/fraud-detection/model.onnx \
            --output governance-record.json

      - name: Sign Governance Record
        run: |
          cosign sign-blob \
            --key ${{ secrets.COSIGN_KEY }} \
            --output-signature governance-record.json.sig \
            governance-record.json

      - name: Store Governance Record
        run: |
          # Store alongside model artifact in registry
          aws s3 cp governance-record.json \
            s3://ml-governance/fraud-detection-v4/4.1.2/governance-record.json
          aws s3 cp governance-record.json.sig \
            s3://ml-governance/fraud-detection-v4/4.1.2/governance-record.json.sig

Model Approval Workflow

# approval_workflow.py
# Manages human approval workflows for high-risk model deployments.

import json
import time
from dataclasses import dataclass
from datetime import datetime, timezone
from enum import Enum

class ApprovalStatus(Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"
    EXPIRED = "expired"

@dataclass
class ApprovalRequest:
    model_id: str
    model_version: str
    model_hash: str
    governance_record_hash: str
    risk_tier: str
    requested_by: str
    required_approvers: list
    required_approvals: int
    approvals: list
    status: ApprovalStatus = ApprovalStatus.PENDING
    created_at: str = ""
    expires_at: str = ""

class ApprovalManager:
    def __init__(self, storage_backend):
        self.storage = storage_backend

    def create_request(self, model_id: str, model_version: str,
                       model_hash: str, governance_record_hash: str,
                       risk_tier: str, requested_by: str) -> ApprovalRequest:
        """Create a new approval request for a model deployment."""

        # Determine required approvers based on risk tier
        if risk_tier == "high_risk":
            required_approvers = ["ml_lead", "compliance_officer"]
            required_approvals = 2
        else:
            required_approvers = ["ml_lead"]
            required_approvals = 1

        request = ApprovalRequest(
            model_id=model_id,
            model_version=model_version,
            model_hash=model_hash,
            governance_record_hash=governance_record_hash,
            risk_tier=risk_tier,
            requested_by=requested_by,
            required_approvers=required_approvers,
            required_approvals=required_approvals,
            approvals=[],
            created_at=datetime.now(timezone.utc).isoformat(),
            expires_at=""  # Set based on policy
        )

        self.storage.save(request)
        return request

    def approve(self, model_id: str, model_version: str,
                approver_id: str, approver_role: str, comment: str = "") -> dict:
        """Record an approval for a model deployment."""
        request = self.storage.get(model_id, model_version)

        if request.status != ApprovalStatus.PENDING:
            return {"error": f"Request is {request.status.value}, not pending"}

        if approver_role not in request.required_approvers:
            return {"error": f"Role '{approver_role}' is not a required approver"}

        approval = {
            "approver_id": approver_id,
            "approver_role": approver_role,
            "comment": comment,
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "governance_record_hash_verified": request.governance_record_hash
        }

        request.approvals.append(approval)

        if len(request.approvals) >= request.required_approvals:
            request.status = ApprovalStatus.APPROVED

        self.storage.save(request)
        return {"status": request.status.value, "approvals": len(request.approvals)}

Continuous Compliance Monitoring

After deployment, continuously verify that the governance record matches the running model.

# prometheus-rules-governance.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: governance-compliance-alerts
  namespace: monitoring
spec:
  groups:
    - name: ai-governance
      interval: 300s
      rules:
        # Alert when a model in production has no governance record
        - alert: ModelMissingGovernanceRecord
          expr: |
            model_serving_active == 1
            unless on(model_id, model_version)
            governance_record_exists == 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Model {{ $labels.model_id }} v{{ $labels.model_version }} is serving traffic without a governance record"
            runbook: "Block traffic to this model. Run governance pipeline before re-enabling."

        # Alert when governance record hash does not match deployed model hash
        - alert: GovernanceRecordMismatch
          expr: |
            governance_model_hash != on(model_id, model_version) model_deployed_hash
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Governance record for {{ $labels.model_id }} does not match deployed artifact hash"
            runbook: "Model may have been modified after governance approval. Investigate immediately."

        # Alert when governance approval has expired
        - alert: GovernanceApprovalExpired
          expr: |
            (time() - governance_approval_timestamp) > 7776000
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Governance approval for {{ $labels.model_id }} is older than 90 days"
            runbook: "Re-run governance pipeline to renew approval. Check for regulation or policy changes since last approval."

        # Alert when required governance stage was skipped
        - alert: GovernanceStageSkipped
          expr: |
            governance_required_stages_total - governance_completed_stages_total > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Model {{ $labels.model_id }} has {{ $value }} skipped governance stages"

Governance Dashboard

{
  "dashboard": {
    "title": "AI Governance Pipeline",
    "panels": [
      {
        "title": "Models in Production by Governance Status",
        "type": "piechart",
        "targets": [{"expr": "count by (governance_status) (model_serving_active == 1)"}]
      },
      {
        "title": "Governance Pipeline Pass Rate (30d)",
        "type": "stat",
        "targets": [{"expr": "sum(governance_pipeline_passed_total) / sum(governance_pipeline_runs_total)"}]
      },
      {
        "title": "Pending Approvals",
        "type": "stat",
        "targets": [{"expr": "count(governance_approval_status{status='pending'})"}],
        "thresholds": [{"value": 0, "color": "green"}, {"value": 3, "color": "yellow"}, {"value": 5, "color": "red"}]
      },
      {
        "title": "Governance Check Failures by Stage (7d)",
        "type": "barchart",
        "targets": [{"expr": "sum by (stage) (increase(governance_check_failed_total[7d]))"}]
      },
      {
        "title": "Time from Training to Production (P50)",
        "type": "gauge",
        "targets": [{"expr": "histogram_quantile(0.5, rate(model_training_to_production_seconds_bucket[30d]))"}],
        "unit": "hours"
      },
      {
        "title": "Models with Expired Governance Approvals",
        "type": "table",
        "targets": [{"expr": "governance_approval_age_seconds > 7776000"}]
      }
    ]
  }
}

Expected Behaviour

  • Every model artifact in the registry has a signed governance record linking it to the exact model hash
  • Governance checks run automatically on every model version; no manual steps required for automated checks
  • High-risk models require human approval from designated roles before deployment proceeds
  • Models deployed without governance records trigger critical alerts within 5 minutes
  • Governance record hash mismatches (modified model after approval) trigger critical alerts
  • Governance dashboard shows real-time status of all models: governed, pending, expired, or ungoverned
  • Average time from training completion to production deployment (including governance) is under 4 hours for low-risk models
  • Governance approvals expire after 90 days, requiring re-evaluation

Trade-offs

Control Impact Risk Mitigation
Blocking governance pipeline in CI/CD No ungoverned model reaches production Governance pipeline failures block all ML deployments Keep governance checks fast (under 10 minutes total). Have an expedited manual override path for urgent fixes with post-hoc documentation.
Signed governance records Tamper-proof link between governance decision and model artifact Key management overhead; signing adds pipeline complexity Use cosign with keyless signing (Sigstore) for simplicity. Store signatures alongside artifacts in OCI registry.
Human approval for high-risk models Critical decisions reviewed by qualified humans Approval bottleneck when approvers are unavailable Define backup approvers. Set SLA for approval response (4 hours). Auto-escalate when SLA is breached.
90-day governance expiry Forces periodic re-evaluation as regulations and data evolve Operational burden of re-running governance for stable models Auto-renew governance for models with no code, data, or configuration changes. Full re-run only when inputs change.
Continuous compliance monitoring Catches governance drift in production Additional metrics and alerting infrastructure Governance metrics are lightweight (one series per model). Marginal overhead on existing Prometheus deployment.

Failure Modes

Failure Symptom Detection Recovery
Governance pipeline check script has a bug Checks pass when they should fail, or vice versa Periodic manual audit of governance check results against known-good and known-bad models Fix the check script. Re-run governance for all models approved during the buggy period.
Model deployed via bypass mechanism Model serving without governance record ModelMissingGovernanceRecord alert fires Block traffic. Run governance pipeline. If model passes, re-enable. If it fails, roll back.
Cosign key compromised Attacker can sign fraudulent governance records Key usage monitoring; unexpected signing events Rotate keys. Re-sign all governance records with new key. Investigate scope of compromise.
Approval workflow blocked by unavailable approvers Models stuck in pending state; deployments delayed Pending approval count metric exceeds threshold; escalation alert fires Activate backup approvers. Review approval role assignments to ensure coverage across time zones.
Governance dashboard data goes stale Dashboard shows incorrect compliance status Dashboard data freshness check; staleness alert Fix the metrics pipeline. Reconcile dashboard state against actual model registry.

When to Consider a Managed Alternative

Building and maintaining a governance pipeline across dozens of models and multiple teams requires sustained engineering investment.

  • Vanta (#169): Automated compliance monitoring that integrates with ML pipelines. Maps governance controls to regulatory frameworks (EU AI Act, NIST AI RMF). Generates audit-ready evidence packages.
  • Grafana Cloud (#108): Governance dashboards with alerting. Visualise pipeline pass rates, pending approvals, and governance coverage across all models. Correlate governance events with model performance metrics.
  • Axiom (#112): Store governance records, approval histories, and audit trails with full-text search. Query historical governance decisions to demonstrate compliance trends.

Premium content pack: AI governance pipeline pack. Complete governance pipeline configuration (GitHub Actions, GitLab CI, Argo Workflows), governance check scripts (data provenance, model card validation, security review), approval workflow implementation (Python), cosign signing integration, Prometheus alert rules for continuous compliance monitoring, and Grafana governance dashboard templates.