Securing AI Infrastructure: A Practical, Risk-Tiered Playbook Aligned to ETSI EN 304 223 / UK AI Cybersecurity Code of Practice

AI Security ETSI EN 304 223 February 2026

The Problem: AI Is Deploying Fast, But the Infrastructure Isn’t Keeping Up

Most AI security conversations focus on the model layer – prompt injection, hallucinations, alignment, responsible AI. Important topics, but they miss the bigger exposure. The infrastructure underneath AI systems – the IAM roles, network paths, data stores, API gateways, compute environments, and CI/CD pipelines – is where the real blast radius lives.

A prompt injection might leak a conversation. A misconfigured service account on an autonomous agent with write access to production HR systems can exfiltrate an entire employee database. An ML training pipeline running in a shared VPC with no egress controls can become a lateral movement path. A model serving endpoint exposed without rate limiting becomes a model extraction target.

These aren’t theoretical risks. They’re infrastructure misconfigurations that exist today in organisations deploying AI at pace – because the security team is focused on what the model says rather than what the infrastructure allows. In most real-world AI incidents we’ve reviewed, the model wasn’t compromised. The cloud configuration was.

The challenge is compounded by the sheer variety of AI deployment patterns now in production. A single organisation might have embedded Copilot features, citizen-built Power Automate agents, Python services calling LLM APIs, SageMaker training pipelines, and self-hosted inference endpoints – each with radically different infrastructure footprints, privilege requirements, and attack surfaces. Traditional cloud security baselines weren’t designed for this.

Prompt-level guardrails can be bypassed. Network segmentation cannot. AI security is only as strong as the infrastructure it runs on.

ETSI TS 104 223, published in April 2025 and building directly on the UK’s Code of Practice for AI Cyber Security, gave the industry its first globally recognised baseline for securing AI systems. Since then, following the harmonisation process, it has become ETSI EN 304 223. The essence of the standard is thirteen principles across five lifecycle phases, expanding to 72 trackable provisions. It’s comprehensive, well-structured, and necessary.

The standard was accompanied by an Implementation Guide – which I authored – providing non-exhaustive scenarios and practical solutions for meeting each provision. This guide has since been adopted as ETSI TR 104 128. It was a valuable first step in translating the principles into actionable guidance. But it was deliberately technology-agnostic – it tells you what good implementation looks like, not which VPC configuration, which IAM policy, or which KMS rotation schedule to deploy on your specific cloud platform. That infrastructure-level translation is exactly the gap this playbook addresses.

Meanwhile, AI adoption isn’t waiting for security teams to close that gap. Organisations are deploying embedded AI features, citizen-built automations, API-consuming agents, ML pipelines, and increasingly autonomous multi-agent systems – often without a clear view of the infrastructure security implications.

Our Approach: Infrastructure Controls, Scaled by Risk

We’ve built the AI Infrastructure Security Playbook around a core insight: the infrastructure security controls for a Microsoft 365 Copilot deployment and a distributed multi-agent mesh have almost nothing in common – yet both need to be secured, and both map to the same ETSI standard. A single checklist either wastes effort on simple deployments or leaves dangerous gaps in complex ones.

The playbook introduces an eight-level Risk Gradient (L1–L8) that classifies AI deployments by their infrastructure risk profile – not what the model does, but what the deployment pattern allows. Each level represents a distinct infrastructure footprint with different IAM requirements, network exposure, data access patterns, and compute isolation needs:

LevelPatternPrimary Risk Driver
L1Embedded AI (Copilot, Atlassian Intelligence)Data exposure, prompt injection, shadow AI
L2AI-assisted development (vibe coding)Unreviewed AI-generated code in production
L3Citizen developer agents (Power Automate, Agentforce)Over-permissioned connectors, workflow abuse
L4Data analytics & API orchestrationAPI misuse, credential leakage
L5Custom autonomous agentsTool abuse, autonomy escalation
L6Data science & ML pipelinesData poisoning, model supply chain
L7Model hosting & servingModel extraction, adversarial inputs
L8Distributed / multi-agent systemsLateral compromise, agent-to-agent trust abuse, autonomous privilege escalation

These eight levels map to four assurance zones that group tiers by the maturity and depth of infrastructure controls required:

HIGH ASSURANCE
L8
Distributed / Multi-Agent Systems Lateral compromise, cascade failure
L7
Model Hosting & Serving Model extraction, adversarial inputs
ADVANCED
L6
Data Science & ML Pipelines Data poisoning, model supply chain
L5
Custom Autonomous Agents Tool abuse, autonomy escalation
STANDARD
L4
Data Analytics & API Orchestration API misuse, credential leakage
L3
Citizen Developer Agents Over-permissioned connectors
FOUNDATIONAL
L2
AI-Assisted Development Unreviewed AI-generated code
L1
Embedded AI (Copilot, SaaS) Data exposure, shadow AI
↑ Higher infrastructure risk & control complexity

L8 deserves particular attention. As we highlight in the recently published OWASP Top 10 for Agentic Applications – produced through the OWASP Agentic Security Initiative, which I lead – multi-agent systems introduce infrastructure risks that have no precedent in traditional cloud security: agent-to-agent trust abuse, tool invocation loops, self-provisioning service accounts, and autonomous privilege escalation. These aren’t model-layer problems. They’re infrastructure control failures that require network-level circuit breakers, capability-scoped identity boundaries, and real-time cascade detection.

Crucially, the gradient is descriptive, not prescriptive. Organisations assess their deployments against the baseline tier, then apply five cross-cutting risk modifiers to determine the effective risk level:

🌍 Exposure Internal → Public
🔒 Data Sensitivity Public → Regulated
🤖 Autonomy Human-approved → Autonomous
Integration Privilege Read-only → Write to prod
🏭 Physical-World Effect Informational → Irreversible
Each modifier rated High → effective tier +1 level · Two or more High → +2 minimum + formal risk assessment

These modifiers capture the infrastructure reality that a standard tier classification misses. A citizen-built Power Automate flow that auto-executes payroll adjustments with write access to production Dynamics 365 HR doesn’t stay at L3. The modifiers – autonomous execution, privileged integration, financial impact – push it to L6, and the infrastructure control set scales accordingly: dedicated service accounts with managed identity, encrypted data stores with CMK, DLP on connectors, environment isolation, and full audit logging.

The operational rule is straightforward: if two or more high-impact modifiers apply – for example, privileged integration combined with autonomous execution, or sensitive data combined with public exposure – escalate at least one tier above the baseline classification. This isn’t a scoring formula; it’s a minimum escalation threshold that ensures infrastructure controls aren’t under-specified for the actual risk the deployment carries.

Seven Infrastructure Control Dimensions

For each risk gradient level, the playbook provides checklist-driven infrastructure controls across seven dimensions. These aren’t abstract governance categories – they map directly to cloud platform services and configurations, with each dimension addressing a specific layer of the deployment stack:

🛡️

Governance

Policies, risk assessment, training, human oversight, DPIA

🔑

Identity

Access control, MFA, credential management, RBAC, JIT

🗄️

Data

Classification, DLP, encryption, provenance, lineage

⚙️

Processing

Compute isolation, CI/CD gates, guardrails, separation

🌐

Network

Segmentation, private endpoints, WAF, egress filtering

📦

Supply Chain

Vendor assessment, dependency scanning, SBOM, integrity

📊

Monitoring & SecOps

SIEM integration (Sentinel, Splunk, Chronicle), CSPM baselines, AI-specific detection rules, canary mechanisms, vulnerability scanning, patching SLAs, drift detection, and 12-month log retention – mapped per cloud platform across all eight tiers

The depth of each dimension scales with the deployment’s risk level. At L1, Identity means Conditional Access with MFA on Copilot licences. At L5, it means dedicated managed identities per agent with permission boundaries preventing self-escalation. At L8, it means mTLS between agents with short-lived credentials and cryptographic identity verification. Same dimension – fundamentally different infrastructure.

SMEs and SaaS-First Organisations: Recognising Your True Tier

Not every organisation runs its own cloud infrastructure – and the risk gradient applies to them just as much. SMEs and SaaS-first organisations that rely on Microsoft 365, Google Workspace, Atlassian Cloud, and Salesforce are climbing the gradient involuntarily, often without realising they’ve moved.

An employee using Copilot to summarise emails is consuming AI at L1. The moment someone builds a Power Automate flow with an AI connector – chaining tools, granting permissions, creating persistent automations that act on their behalf – they’ve crossed into L3. When an employee uses Cursor or Replit to vibe-code a customer-facing app that handles real data, they’ve entered L5 territory. The organisation thinks it’s “just using tools.” The risk gradient says otherwise.

AI doesn’t create new data access — it surfaces what existing permissions already allow, but faster, more comprehensively, and through natural language queries that bypass the friction that previously protected poorly permissioned content.

The critical gap is that Copilot is only as secure as the tenancy it reads from. If SharePoint permissions are a mess – and in most SMEs they are, after years of inherited access and staff turnover – AI will happily surface board papers to anyone who asks. The controls at this level aren’t about VPCs and IAM policies. They’re about tenancy hardening: permissions audits, sensitivity labels, DLP policies, conditional access, encryption verification, and managed SOC integration. Same ETSI provisions, same risk gradient – different implementation context.

For citizen-built automations (L3–L4), the controls become governed automation: Power Platform environment isolation, connector DLP policies that prevent data egress through HTTP connectors, approval workflows for any automation that moves data between applications, and offboarding processes that catch orphaned flows when staff leave. For vibe-coded applications (L5–L6), it’s controlled creation: an application registry, mandatory security scanning, deployment governance, and decommissioning processes – because when the person who built it leaves, nobody can maintain, patch, or shut it down.

We’ve published a dedicated supplemental guide – AI Risk Gradient for SMEs and SaaS-First Organisations – that translates every risk tier into SaaS tenant configuration, includes a diagnostic for recognising your true tier, provides an MSSP evaluation framework, and maps every control to ETSI EN 304 223 provisions. It’s available in the download bundle below.

Cloud-Specific Implementation Guides

The main playbook defines the controls. Three companion guides map every control to specific services, configurations, IAM policies, and network settings on each major cloud platform – the level of detail a platform engineer needs to implement them:

AWS

Security Hub, GuardDuty, SageMaker, Bedrock, Secrets Manager, KMS, VPC PrivateLink, Inspector, CloudTrail

Azure

Defender for Cloud, Sentinel, Entra ID, Key Vault, Azure OpenAI, Azure ML, Purview, PIM, Private Link

GCP

Security Command Center, Chronicle, Vertex AI, Cloud KMS, Secret Manager, VPC Service Controls, Binary Authorization

Each guide includes a service mapping table and tier-by-tier checkbox checklists – specific enough that a platform engineering team can pick up their cloud guide and start implementing controls against their assessed risk gradient level immediately.

The table below shows how each security control in the playbook maps to concrete services across all three clouds, with every row aligned to ETSI EN 304 223 provisions. This is the translation layer – the same control, three implementations, one standard:

Control AWS Azure GCP ETSI
Foundational L1–L2
Identity & Access Control IAM Identity Center SCPs Entra ID Conditional Access PIM Cloud Identity Context-Aware Access Org Policies P4 P5
Encryption & Key Management KMS CMK SSE-KMS Key Vault CMK Storage Encryption Cloud KMS CMEK P6
CSPM & Posture Security Hub GuardDuty Defender for Cloud Sentinel SCC Premium Chronicle P3 P11
Credential Hygiene IAM Roles only Managed Identity only Workload Identity only P5 P6
Standard L3–L4
Private Service Isolation VPC Endpoints PrivateLink Private Endpoints VNet Private Service Connect VPC SC P2 P6
API Gateway & Rate Limiting API Gateway WAF APIM WAF Apigee Cloud Armor P2 P4
Egress Enforcement Network Firewall VPC Endpoints Azure Firewall UDR Cloud NAT Firewall Rules P2 P6
Advanced L5–L6
Per-Agent Identity IAM Role per Agent Permission Boundaries Managed Identity per Agent Deny Assignments SA per Agent IAM Deny Policies P4 P5
Automated Kill Switch IAM Deny SG Modify SNS Entra Revoke NSG Firewall IAM Deny Firewall WI Revoke P2 P4 P11
Model Registry & Integrity SageMaker Registry SHA-256 Azure ML Registry SHA-256 Vertex AI Registry SHA-256 P7 P9
Secure Decommissioning IAM Role Delete KMS Schedule Delete S3 Lifecycle MI Removal Key Vault Purge Retention Policy SA Delete KMS Destroy Retention Lock P13 P6
High Assurance L7–L8
mTLS Between Agents App Mesh ACM Private CA AKS Service Mesh Anthos Service Mesh P6 P7
Workload Policy Enforcement EKS PSA OPA Gatekeeper AKS Pod Security Azure Policy GKE PSA OPA Gatekeeper P3 P4
Extraction Detection API GW + WAF CloudWatch APIM Sentinel Analytics Apigee Cloud Monitoring P8 P11 P12
Cross-Cloud Security Invariant: Agents must never modify their own identity bindings, access policies, or trust relationships — enforced identically across all three clouds.

Getting Started: Five Steps

Every document includes a practical Getting Started workflow that organisations can begin executing immediately:

1

Build Your AI Inventory

Discover every AI workload. Classify against L1–L8. Apply modifiers. Assign ownership.

AI asset register Risk gradient per workload Effective tier (with modifiers)
2

Assess Baseline with CSPM

Run Defender for Cloud / Security Hub / SCC across all AI workloads. Record your starting posture.

CSPM score per account Findings by severity Coverage gaps identified
3

Map to ETSI EN 304 223

Use the playbooks to map CSPM findings to ETSI EN 304 223 provisions and build a prioritised remediation plan. Our forthcoming agent, DeepCyberCop, will automate this mapping.

ETSI provision mapping Prioritised remediation list Tier-specific action items
4

Remediate by Priority

Work the list. Start at your highest risk tiers. Target quick wins that close multiple ETSI provisions – encryption, MFA, logging, least-privilege.

Controls implemented CSPM score improvement Provision coverage %
5

Continuous Governance

Recurring CSPM reviews, quarterly reassessments, evolving inventory. Report posture using CSPM scores and ETSI coverage as key metrics.

Leadership reporting Trend dashboards Audit evidence
↻ Continuous cycle – inventory evolves as AI workloads change

Why We Built This

As the author of the UK’s Code of Practice for AI Security Implementation Guide, I’ve seen first-hand where organisations struggle. It’s not with understanding the principles – it’s with translating them into infrastructure controls that platform teams can actually implement, verify, and monitor. The CISO gets the “why.” The platform engineer needs the “which IAM role, which VPC config, which KMS policy.”

This playbook closes that gap. It’s infrastructure-first by design – every control maps to a concrete cloud service configuration, not an abstract recommendation. Checklist-driven, not narrative. Cloud-specific where it needs to be, principle-aligned throughout. We’ve released it to help teams move from “we should secure our AI infrastructure” to “here’s exactly what we’re configuring this sprint.”

The playbooks are freely available for download. We welcome feedback from platform engineers, security architects, and CISOs who are working through the challenge of securing AI infrastructure in their own environments.

Download the Playbooks

Get the main playbook, cloud implementation guides, and SME supplemental guide. Start with Step 1 – build your AI inventory – then use your CSPM and the playbooks to turn infrastructure findings into a prioritised remediation plan. Our forthcoming agent, DeepCyberCop, will automate ETSI provision mapping.

Want help securing your AI infrastructure? Get in touch – we’ll help you assess your risk gradient, review your cloud posture, and build a prioritised remediation plan.

Scroll to Top