InsuranceOS: Agentic AI for Insurance

Command CenterLive Β· Claims Open
Open Claims
284
Across all lines
Fraud Flags Active
7
Investigator review needed
Auto-Adjudicated Today
184
65% straight-through
Avg Settlement Time
3.2d
vs 21d industry avg
πŸ€– AI Agent Status
14 insurance AI agents across claims, underwriting, fraud, and compliance
Fraud Detection Engine7 flags Β· investigating
Claims Triage AI284 claims classified
Underwriting AI47 policies reviewed
Document Intelligence847 docs processed
Actuarial IntelligencePortfolio risk updated
Regulatory ComplianceAll filings current
πŸ“‘ Live Claims Intelligence Feed
Real-time AI activity across all insurance operations
Priority Claims Requiring Attention
CLM-2024-08471
FRAUD SUSPECTED
Commercial Property β€” Β£284,000
Claimant: Meridian Holdings Ltd Β· Fire damage claim
AI: 3rd fire claim in 24 months Β· supplier link anomaly Β· 0.91 fraud score
CLM-2024-07823
COMPLEX
Employers Liability β€” Β£1.2M
Claimant: J. Torres Β· Workplace injury Β· Legal representation
AI: Liability disputed Β· medical evidence inconsistent Β· reserve Β£1.4M
CLM-2024-09102
AUTO-SETTLE
Motor Own Damage β€” Β£4,200
Claimant: A. Sharma Β· Rear-end collision Β· Dashcam confirmed
AI: Liability clear Β· repair validated Β· recommend immediate settlement
Why InsuranceOS
πŸ” Insurance Fraud: Β£3.2B/Year
UK insurance fraud costs Β£3.2B annually. 80% of fraud is organised and cross-claim β€” invisible to per-claim manual review. InsuranceOS detects network patterns, supplier links, and behavioural anomalies across the entire claims portfolio simultaneously.
⏱ Claims: 21 Days Too Long
Industry average claims settlement: 21 days. InsuranceOS triages, validates, and auto-adjudicates 65% of straight-through claims in under 4 hours. Complex claims handled in 3.2 days. Customer satisfaction: 4.7/5 vs 3.1 industry average.
πŸ“Š Underwriting: Mispriced Risk
Manual underwriting misses signals visible only in aggregate data. AI underwriting combines real-time telematics, satellite imagery, social signals, and claims history to price risk with 34% lower combined ratio than traditional approaches.
Fraud Suspected
7
Complex / Disputed
34
Straight-Through
184
Under Review
59
Total Reserve
Β£8.4M
Active Claims Queue
CLM-2024-08471
FRAUD: 0.91
Commercial Property Β· Β£284,000
Meridian Holdings Β· Fire Β· 3rd claim 24mo
CLM-2024-07341
FRAUD: 0.84
Motor Theft Β· Β£28,000
Vehicle history: 2 prior thefts Β· same garage link
CLM-2024-07823
COMPLEX
Employers Liability Β· Β£1.2M
J. Torres Β· Legal rep Β· Liability disputed
CLM-2024-08904
REVIEW
Professional Indemnity Β· Β£180,000
Architect firm Β· Design error alleged Β· Expert needed
CLM-2024-09102
AUTO-SETTLE
Motor OD Β· Β£4,200
A. Sharma Β· Dashcam Β· Liability confirmed
CLM-2024-09241
AUTO-SETTLE
Home Contents Β· Β£2,800
T. Okafor Β· Burglary Β· Police report confirmed
Claim Detail β€” CLM-2024-08471
Commercial Property Fire β€” Β£284,000
Claimant: Meridian Holdings Ltd Β· Notified: 18 May 2026
FRAUD: 0.91
Claim Value
Β£284,000
Prior Claims
2 fires in 24 months
Fraud Score
0.91 β€” VERY HIGH
Network Flag
Supplier link detected
⚠ AI Fraud Intelligence
1. 3rd fire claim from same insured in 24 months β€” pattern anomaly flagged
2. Loss assessor (J. Connell & Associates) linked to 4 other suspicious fire claims in 18 months
3. Business financial distress signals: CCJs filed against Meridian in Q1 2026
4. Satellite imagery: no fire damage visible to neighbouring structures (inconsistent with claimed cause)
Total Agents
14
Decisions Today
2,847
Fraud Flags
7
Auto-Settled
184
Claims Intelligence Agents
πŸ”
Fraud Detection Engine
Network analysis across claimants, suppliers, solicitors, and loss adjusters. Detects organised fraud rings invisible to per-claim review. Cross-references against industry fraud databases and internal claim history.
Running Β· 7 flags active
ReAct + Network Graph
⚑
Claims Triage AI
Classifies every inbound claim into straight-through, complex, fraud-suspected, or manual-review paths within 60 seconds. Routes automatically to correct workflow. 65% straight-through rate.
Running Β· 284 classified
Sequential + ML
πŸ“„
Document Intelligence
Extracts structured data from police reports, medical records, invoices, repair estimates, and correspondence. Validates internal consistency and cross-references against policy terms and prior claims.
Running Β· 847 docs
Reflection + Vision
Underwriting & Risk Agents
πŸ“Š
AI Underwriting Engine
Risk scoring from telematics, satellite imagery, social signals, and claims history. Recommends premium, terms, and exclusions for new and renewal business. Underwriter review and approval required.
Running Β· 47 policies
Reflection + Quant
πŸ“ˆ
Actuarial Intelligence
Continuous portfolio loss ratio monitoring, reserve adequacy analysis, emerging risk identification, and pricing adequacy signals. Actuarial sign-off required on all reserve recommendations.
Running Β· Portfolio live
Reflection + Statistics
🌍
Catastrophe Risk Monitor
Monitors natural catastrophe exposures, climate risk signals, and accumulation of correlated risks across the portfolio. PML estimates updated dynamically from weather and geological data.
Processing Β· Live data
ReAct + Geo Data
Operations Agents
πŸ’¬
Customer Intelligence
Predicts churn risk, identifies cross-sell opportunities, personalises communication, and monitors customer satisfaction signals. Retention intervention triggered 90 days before renewal for at-risk customers.
Running Β· 12K policies
ReAct + Signals
βš–οΈ
Subrogation AI
Identifies subrogation and contribution opportunities in settled claims. Calculates recovery likelihood and value. Drafts letter of claim and coordinates with third-party insurers. Solicitor review required.
Running Β· Β£847K pipeline
Reflection + Legal
πŸ“‹
Regulatory Compliance
Monitors FCA conduct requirements, Solvency II capital adequacy, GDPR for claims data, and Lloyd's reporting obligations. Flags compliance gaps before regulatory breach. All submissions FCA-standard.
Running Β· All compliant
Sequential + Rules
Active Fraud Flags
7
Detected This Month
Β£2.1M
Fraud prevented
Detection Rate
97%
vs 34% manual
False Positive Rate
3.2%
vs 12% industry avg
Top Fraud Alerts
πŸ” Organised Fraud Ring β€” 4 Claims Β· Β£847,000 Total Exposure
RING: 0.94
Network analysis identified 4 claims (CLM-08471, CLM-07341, CLM-06892, CLM-08124) connected via shared loss assessor (J. Connell & Associates), common solicitor (Avon Legal LLP), and financial distress signals across all claimants. CLM-07341 vehicle previously "stolen" and found β€” same garage as current theft claim. Ring confidence: 0.94.
CLM-08471 Β· Meridian Holdings Β· Fire Β£284K Β· Assessor: J.Connell
CLM-07341 Β· R. Hassan Β· Motor theft Β£28K Β· Garage: QuickFix Bristol
CLM-06892 Β· Delta Properties Β· Water damage Β£320K Β· Assessor: J.Connell
CLM-08124 Β· T. Associates Β· Burglary Β£215K Β· Solicitor: Avon Legal LLP
Fraud Detection β€” 5 Signal Types
πŸ•Έ Network Analysis
Maps connections between claimants, loss adjusters, solicitors, repair garages, and medical providers. Identifies rings and clusters invisible to individual claim review. Updates in real time as new claims arrive.
πŸ“Š Behavioural Signals
Claim timing patterns (e.g. claims filed shortly after premium increase), notification delays, inconsistent event narratives, and history of prior claims across the industry database (IFB cross-reference).
πŸ›° External Data
Satellite imagery, weather data, Companies House records, CCJ filings, DVLA data, and social media signals cross-referenced against claim circumstances to identify implausible or inconsistent claims.
Policies Reviewed
47
Combined Ratio (AI)
82%
vs 94% manual baseline
Pricing Accuracy
+34%
vs manual assessment
Declination Rate
12%
High-risk correctly declined
πŸ“Š AI Underwriting β€” Commercial Property
Risk 0847 Β· Riverside Industrial Estate Β· Β£4.2M TSI
underwriting-agent Β· RISK-0847
INGEST β†’ Survey, accounts, claims history parsed
GEO β†’ Satellite: flood zone 2 Β· no recent damage
FIRE β†’ Sprinkler system: confirmed Β· BAFE cert
CLAIMS β†’ 1 minor claim Β£8K in 5yr Β· clean
PRICE β†’ Rate: Β£0.12% of TSI Β· Β£5,040 premium
RECMD β†’ ACCEPT Β· Standard terms Β· UW review
AI Recommendation: Accept β€” Good risk. Sprinkler installed, clean claims history, flood zone acceptable. Recommended premium: Β£5,040 at 0.12% of TSI. Underwriter review and binding required.
πŸ“ˆ Underwriting Intelligence β€” Data Sources
7 data sources combined for every risk assessment
Claims history: Internal + IFB cross-industry database. 5-year loss pattern and frequency scoring.
Satellite & geo data: Flood zones, subsidence risk, crime rates, neighbouring hazards, building condition from aerial imagery.
Financial intelligence: Companies House, CCJ filings, credit signals β€” financial distress is a leading indicator of moral hazard.
Telematics (motor): Real-time driving behaviour data for motor risks β€” speed, braking, time-of-day, route risk.
Survey intelligence: AI reads survey reports, extracts defects, and scores risk quality β€” no manual summarisation needed.
Combined Ratio
84%
Portfolio YTD
Reserve Adequacy
101%
Emerging Risk Flags
3
Pricing Adequacy
+8%
Above required margin
πŸ“ˆ Actuarial Intelligence β€” What It Monitors
Actuarial Intelligence provides continuous portfolio analytics that previously required quarterly actuarial runs. Reserve adequacy is assessed daily against emerging claims development β€” flagging reserve deficiencies before they become material. Loss ratio monitoring by product line, geography, and distribution channel identifies adverse development early. Pricing adequacy analysis compares written premium rates against current loss cost trends β€” identifying lines where rates need adjustment before the underwriting cycle turns. Emerging risk signals: 3 currently flagged β€” climate-driven subsidence (South-East England), cyber supply chain (commercial lines), and electric vehicle battery fire claims frequency increase. All reserve and pricing recommendations require Chief Actuary review and sign-off.
Policies Monitored
12,847
Churn Risk Flags
384
Renewal in 90 days
Retention Rate (AI)
94%
vs 81% pre-AI
Customer Satisfaction
4.7/5
πŸ’¬ Customer Intelligence β€” Retention & Growth
Customer Intelligence monitors 12,847 policies for churn signals 90 days before renewal: price sensitivity (competitor quote requests), claim dissatisfaction scores, life event triggers (property purchase, new vehicle, business change), and engagement decline. Retention outreach is personalised β€” not a blanket renewal reminder. Cross-sell recommendations are generated from life event signals and coverage gap analysis (e.g. customer has home contents but no buildings β€” buildings cover recommended at renewal). NPS tracking and complaint root cause analysis feeds back into product and pricing development.
FCA Compliance
100%
Solvency II (SCR)
184%
Coverage ratio
Regulatory Filings
All current
Conduct Risk Flags
0
πŸ“‹ Regulatory Compliance Intelligence
InsuranceOS maintains continuous compliance monitoring across all applicable regulatory frameworks. FCA Consumer Duty: all claims handling decisions monitored for fair customer outcomes β€” pricing, settlement, and communications reviewed for conduct risk. Solvency II: SCR and MCR calculations updated daily. GDPR: all claims data processing logged with purpose limitation and retention schedules enforced. Lloyd's reporting: bordereaux and premium accounts auto-generated. IDD compliance: all advice and recommendation processes documented for MiFID II equivalent insurance distribution standards. Regulatory examination readiness: evidence packs maintained and audit-ready at all times.
Recovery Pipeline
Β£847K
Recovery Rate (AI)
72%
vs 41% manual
Opportunities Found
47
This quarter
Avg Recovery Value
Β£18K
βš–οΈ Subrogation AI β€” Recovery Intelligence
Subrogation opportunities are often missed in manual claims processing because handlers are focused on settlement, not recovery. Subrogation AI reviews every settled claim for third-party liability β€” motor accidents, defective products, contractor negligence, and neighbour liability. It calculates recovery likelihood and estimated value, identifies the responsible third party and their insurer, drafts the letter of claim, and tracks the recovery process through to resolution. Solicitor review is required before any letter of claim is issued. Recovery rate with AI: 72% vs 41% manual. Average quarterly recovery uplift for a mid-size insurer: Β£340K.
Agents Active
14
Decisions/Day
2,847
Fraud Prevented
Β£2.1M
Compliance Events
0
πŸ“‘ Live Agent Trace
All AI decisions logged Β· FCA Β· Solvency II Β· GDPR compliant
πŸ›‘ Insurance AI Governance
Advisory intelligence β€” underwriters and adjusters decide
No autonomous claims decisions: All settlement, decline, and reserve decisions require authorised adjuster or underwriter approval. AI recommends and routes β€” humans decide and bind.
FCA Consumer Duty: All AI outputs monitored for fair customer outcomes. Explainability requirement met for every claims and underwriting decision β€” no black-box outputs.
Fraud referral protocol: All fraud flags presented as suspicions requiring investigation β€” never automated declines. IFB referral and SIU escalation follow defined protocols. Human judgment always governs outcome.
Actuarial sign-off: All reserve and pricing recommendations require Chief Actuary review. AI provides analysis β€” qualified actuaries certify reserves per regulatory requirements.
AgentOps β€” Live Agent Observability

πŸ“‘ Live Trace Feed

πŸ“Š Session Metrics (24h)

Total Sessions2,847
Avg Latency1.4s
P95 Latency3.1s
Error Rate0.3%
Tool Calls12,284
HITL Escalations47
RAGAS GatePASS βœ“

πŸ’° Cost & Tokens

Cost (24h)Β£847
Input Tokens48.2M
Output Tokens12.4M
Cache Hit Rate67%
Cost/SessionΒ£0.30

🎯 RAGAS Quality Scores

Faithfulness0.94 βœ“
Answer Relevance0.91 βœ“
Context Precision0.89 βœ“
Context Recall0.93 βœ“
Hallucination Rate0.8%

πŸ€– Agent Health

All agentsHealthy
OrchestratorActive
Tool registryOnline
MCP serversConnected
Memory storeHealthy
MLOps / LLMOps β€” Model Lifecycle

🧠 Model Registry

claude-sonnet-4-5 PRODUCTIONPrimary
claude-haiku-4-5 ROUTINGFast path
claude-opus-4-5 SHADOWComplex
text-embedding-3-large RAGVectors

Automatic fallback routing. Versioned in MLflow. Prompt changes require RAGAS eval gate pass.

πŸ“ˆ Drift Detection

Faithfulness drift (7d)+0.02 stable
Latency drift (7d)+120ms watch
Output length driftWithin Β±5%
Sentiment driftNo anomaly
Alert thresholdΞ”>0.05 β†’ PagerDuty

πŸ”€ A/B Experiment Controller

Prompt v2.3 vs v2.4Running
CoT vs DirectStaging

Statistical significance (p<0.05) required before promotion.

πŸͺ Feature Store

Vector IndexPinecone
Dimensions3,072
Indexed Docs284K
Retrieval P9542ms

πŸ“¦ Prompt Version Control

System promptsGit-tracked
Few-shot examplesVersioned
Eval datasetsDVC tracked
DevSecOps β€” Security-First CI/CD Pipeline

πŸš€ CI/CD Pipeline

πŸ”SAST β€” Semgrep + BanditPASS
πŸ“¦SCA β€” SBOM + TrivyPASS
πŸ§ͺUnit + Integration tests847/847
🎯RAGAS eval gate (β‰₯0.92)0.94 βœ“
πŸ”Secrets scan β€” GitleaksCLEAN
🐳Container scan β€” Grype0 CRITICAL
🚒Deploy β†’ KubernetesDEPLOYED

πŸ” Security Posture

RBAC β€” Role-based accessEnforced
API keys β€” HashiCorp VaultRotated 30d
mTLS β€” Istio service meshActive
PII scrubbing β€” NeMoActive
Audit log β€” ImmutableCloudWatch
Pen testQuarterly
SOC 2 Type IIIn progress
ISO 27001Compliant

πŸ— Infrastructure as Code

TerraformCloud infra
HelmK8s workloads
ArgoCD GitOpsSynced
Kustomize overlaysdev/stg/prd

♻️ Rollback & DR

RTO Target<15 min
RPO Target<5 min
Blue/Green DeployActive
Auto-rollbackError rate >1%

πŸ“‹ Regulatory Compliance

GDPR Art. 22 HITLEnforced
EU AI Act Art. 9Documented
NIST AI RMFMapped
ISO/IEC 42001Compliant
AI Observability β€” OpenTelemetry + Langfuse

πŸ”­ Observability Stack

L1TracesOpenTelemetry β†’ Jaeger
L2MetricsPrometheus β†’ Grafana
L3LLM TracesLangfuse (self-hosted)
L4LogsFluentd β†’ OpenSearch
L5AlertsAlertManager β†’ PagerDuty

πŸ“Š SLO Dashboard

Availability SLO99.9% target
Current (30d)99.96%
Error Budget73% remain
P50 Response0.8s
P95 Response3.1s
P99 Response7.4s

🚨 Active Alerts

Latency P95Normal
Error rate0.3% βœ“
Token budget84% remain
RAG recall0.93 βœ“
Latency drift+120ms watch

πŸ”¬ Langfuse Trace Explorer

πŸ“ˆ Avg Span Breakdown

API Gateway12ms
Auth + RBAC8ms
RAG retrieval42ms
Guardrail check18ms
LLM inference1,240ms
Tool execution84ms
Total E2E1,452ms
Guardrails β€” Responsible AI Framework

πŸ›‘ NeMo Guardrails β€” Active Rails

βœ… Human-in-the-Loop (HITL) Gate
All consequential actions require human approval before execution. Confidence <0.85 always escalates. GDPR Article 22 compliant β€” no fully automated consequential decisions.
πŸ” PII Detection & Scrubbing
Microsoft Presidio + custom patterns. Names, emails, NI/SSN, card numbers scrubbed from all LLM I/O before logging. 47 entity types across 12 jurisdictions.
🚫 Toxicity & Hallucination Filter
NeMo topic rails block off-topic responses. Factual grounding check cross-references every claim against retrieved context. Hallucination >5% triggers human review queue.
⏱ Rate Limiting & Abuse Prevention
Per-user token budgets at API gateway. 10Γ— anomalous usage triggers suspension + security alert. Cloudflare WAF DDoS protection.

πŸ“‹ Audit Trail & Explainability

πŸ“ Immutable Decision Log
Every AI recommendation logged: input context, retrieved docs, reasoning chain, confidence, model version, user ID, timestamp. 7-year retention for regulated decisions.
πŸ”Ž Explainability (XAI)
Every recommendation includes source citations, confidence intervals, alternatives considered, and limitation disclosures. SHAP attribution for structured ML models.
βš–οΈ Bias Monitoring
Fairness metrics tracked across protected characteristics. Disparate impact analysis monthly. EU AI Act Article 10 data governance requirements met.
πŸ› Regulatory Mapping
GDPR Art. 5/22 Β· EU AI Act Art. 9/10/13/14 Β· NIST AI RMF Β· ISO/IEC 42001 Β· IEEE 7001 Transparency. Compliance evidence pack generated quarterly.
0.3%
Hallucination Rate
Target <2%
100%
HITL Coverage
Consequential acts
0
PII Leaks (30d)
Target: 0
A+
Security Grade
Mozilla Observatory
Multi-Agent Architecture β€” Mesh & Orchestration

πŸ•Έ Agent Mesh Topology

Orchestrator
Agent 1
Agent 2
Agent 3
Agent 4
Agent 5
Agent 6

Orchestrator decomposes tasks, routes to specialists, aggregates results, handles conflicts. All inter-agent communication via typed schemas. No agent takes external action without Orchestrator validation.

βš™οΈ Agent Patterns

ReAct β€” Reason + Act loopsAnalytical
Reflection β€” Self-critique cyclesHigh-stakes
Planning β€” Hierarchical decompositionMulti-step
RAG β€” Retrieval-augmented genKnowledge
HITL β€” Human-in-the-loopAll consequential
Tool Use β€” Function callingAll agents

πŸ”„ Temporal.io Orchestration

Active Workflows2,847
HITL Signals Pending47
Retry PolicyExp backoff Γ—3
Saga PatternCompensating txns
Durable ExecutionCrash-safe βœ“

πŸ“¨ Kafka Message Bus

Topics47 agent topics
Throughput12K msgs/s
Consumer Lag<100ms
Schema RegistryConfluent
Dead Letter QueueMonitored

πŸ”Œ MCP Integration Layer

MCP β€” Data sourcesActive
MCP β€” CRM/ERPActive
MCP β€” Document storeActive
OAuth 2.0 authAll connectors
JSON Schema validationAll tools
Evaluation Framework β€” Continuous Quality Gates
0.94
Faithfulness
Gate β‰₯0.92 βœ“
0.91
Answer Relevance
Gate β‰₯0.88 βœ“
0.89
Context Precision
Gate β‰₯0.85 βœ“
0.93
Context Recall
Gate β‰₯0.90 βœ“

πŸ§ͺ Eval Suite Composition

Golden dataset2,847 Q&A pairs
Unit evals (per agent)120–400 cases
Integration evals84 end-to-end flows
Adversarial probes47 jailbreak tests
LLM-as-judgeclaude-opus-4-5
Human eval cadenceWeekly 5% sample

πŸ” Eval-Driven Dev Flow

1
Change proposed β†’ PR opened
Automated eval suite runs against golden dataset in CI. Results posted to PR.
2
RAGAS gate enforced
All metrics must meet thresholds. Failure blocks merge.
3
Canary deploy (5%)
Langfuse online evals on live traffic. Drift alerts trigger auto-rollback.
4
Full rollout + monitor
Weekly human eval sample. Monthly RAGAS full re-run.
Infrastructure β€” Kubernetes Β· Scale Β· Resilience

☸️ Kubernetes Cluster

ClusterEKS / GKE / AKS
Node pools3 (system Β· app Β· GPU)
HPA targetCPU 70% β†’ scale
KEDA triggersKafka consumer lag
Spot instances80% non-critical
Multi-AZ3 zones

πŸ’Ύ Data Architecture

PostgreSQL (RDS)Operational
Redis (ElastiCache)Session + cache
Pinecone / pgvectorVector search
S3 Intelligent TierDocuments
Kafka (MSK)Event streaming
Snowflake / BigQueryAnalytics DWH

πŸ’° Cost Architecture

LLM API (Anthropic)~45% of AI cost
Vector DB~12% of AI cost
Compute (K8s)~28% of AI cost
Prompt cache savingsβˆ’67% input tokens
Haiku fast-path savingβˆ’40% LLM spend
Est. monthly totalΒ£8–28K

πŸ” Disaster Recovery

1
Primary failure detected (<2 min)
Route53 health check fails β†’ DNS failover. Temporal promotes standby. Kafka MirrorMaker live.
2
DR validates (<5 min)
Smoke tests auto-run. PagerDuty alert to on-call. RTO target: 15 minutes.
3
Data reconciled (<15 min)
PostgreSQL read replica promoted. S3 cross-region lag <5min. RPO: 5 minutes.

πŸ“Š Capacity Planning

  • Baseline: 3 app nodes Β· 2 vCPU Β· 8GB RAM each
  • Scale trigger: Kafka consumer lag >10K msgs
  • Max scale: 20 nodes via KEDA + HPA
  • LLM concurrency: 50 parallel sessions managed
  • Vector search: Pinecone p1 β†’ p2 at 500K docs
  • DB connections: PgBouncer pool (max 500)
Documentation β€” Deployment Guide & Runbook

πŸš€ 10-Week Deployment Guide

1
Week 1–2: Data Foundation & Infrastructure
Deploy K8s cluster. Provision Temporal.io, Kafka, PostgreSQL, Pinecone. Connect source systems via MCP. Establish data governance and RBAC. Run baseline eval on golden dataset.
2
Week 3–4: Core Agents Live
Deploy first 3 highest-value agents. Wire HITL approval workflows in Temporal. Configure NeMo guardrails and PII scrubbing. Set up Langfuse tracing and RAGAS eval gate.
3
Week 5–7: Full Agent Mesh
Deploy all agents. Configure Orchestrator routing. A/B test prompt variants. Enable drift detection. Train end-users on HITL workflow.
4
Week 8–10: Production Hardening
Pen test + SAST/DAST scan. Load test 10Γ— baseline. Configure PagerDuty. Compliance review (GDPR, EU AI Act). Produce runbook. Go-live.

πŸ— 7-Layer Platform Stack

L7PresentationReact Β· Next.js Β· SSO
L6API GatewayFastAPI Β· OAuth2 Β· WAF
L5OrchestrationTemporal.io Β· LangGraph
L4Agent RuntimeNeMo Β· RAGAS Β· Tools
L3Model + ToolsClaude API Β· MCP servers
L2Data + IntegrationKafka Β· PostgreSQL Β· Redis
L1ObservabilityOTel Β· Langfuse Β· Grafana

πŸ”Œ Integration How-To

  • MCP server per data source (REST/GraphQL/gRPC)
  • OAuth 2.0 service account per enterprise system
  • Kafka topics per agent capability namespace
  • Schema registry for typed message contracts
  • Data lineage via OpenLineage β†’ Marquez
  • Webhooks for real-time event ingestion
  • dbt + Airflow for batch data refresh

πŸ‘€ RBAC User Roles

ViewerRead dashboards
AnalystRun queries + export
ApproverHITL decisions
ManagerConfig + agents
AdminFull platform
AI EngineerModels + prompts

IdP via Okta/Azure AD. MFA enforced for Approver+.

πŸ“ž Incident Runbook

  • High latency (>5s): Check Langfuse trace β†’ vector store β†’ LLM API status
  • RAGAS gate fail: Roll back last prompt change β†’ notify AI engineer
  • Error spike: Circuit breaker β†’ fallback to previous version
  • PII leak: Suspend session β†’ DPO notification within 24h
  • HITL queue backup: Escalate to senior approver
  • Cost overrun: Auto-throttle β†’ route to Haiku