InvestOS: Agentic AI for Investment

Command Center100 Agents Β· Always On
Specialised Agents
100
6 coordinated teams
Deals Screened (QTD)
284
18 passed to DD
Active Due Diligence
4
IC memos in progress
Portfolio Watched
12
KPIs monitored daily
πŸ€– Six-Team Agent Status
100 agents across deal sourcing, DD, valuation, portfolio, market intel, and exit
Deal Sourcing (18 agents)18 deals screened today
Due Diligence (20 agents)4 active Β· 2 IC memos ready
Valuation (16 agents)3 comp sets running
Portfolio Intel (18 agents)All 12 companies current
Market Research (14 agents)12 sectors tracked
Exit Intelligence (14 agents)2 exit-ready Β· CIMs drafted
πŸ“‘ Live Intelligence Feed
Real-time agent activity across all 6 teams
From First Check to Final Exit
STAGE 1
πŸ” Find the right deals
Stop wasting time on misfit deals. 18 sourcing agents score inbound, map warm paths, benchmark founders, and surface sector tailwinds before you spend an hour on a call.
STAGE 2
πŸ“‹ Numbers that hold up
Every model, metric, and reference check stress-tested before it lands in front of the IC. 20 DD agents produce a complete investment memo in 8 minutes.
STAGE 3
πŸ’° Valuation with conviction
Comparable companies, precedent transactions, DCF, liquidation preferences, dilution modelling β€” all built before term sheet. Know your entry price with precision.
STAGE 4
πŸ“Š Portfolio intelligence
18 agents monitor every portfolio company continuously β€” KPIs, runway, competitive threats, fundraising signals. LP updates drafted automatically.
STAGE 5
🌐 Market ahead of the crowd
14 research agents track 12 sectors, monitor VC activity, validate investment theses, and surface emerging categories before they become obvious.
STAGE 6
πŸš€ Exit clarity from day one
Know the buyer universe before you invest. Map M&A comps, model exit proceeds, track exit readiness β€” so every decision compounds toward the outcome.
Total Agents
100
Teams
6
Active Now
84
Decisions/Day
4,700
πŸ” Deal Sourcing
18 agents
Deal Flow ScreenerSector Fit ScorerStage Fit ValidatorCheck Size MatcherFounder Background CheckerTeam Completeness CheckerTraction ValidatorThesis Alignment ScorerWarm Intro MapperCold Outreach WriterMarket Timing ScorerFounder Network AnalyserCompetitive Deal MonitorCo-investor FinderReference Check CoordinatorPipeline Stage ManagerDeal Source TrackerDeal Memo Drafter
πŸ“‹ Due Diligence
20 agents
Financial Model ReviewerRevenue Quality AnalyserUnit Economics ValidatorChurn Analysis AgentCohort Revenue AnalyserCustomer Reference InterviewerTech Stack AssessorIP Ownership VerifierLegal Red Flag ScannerCap Table Cleanliness CheckerContract Risk ReviewerRegulatory Risk ScannerCompetitive Moat AssessorKey Person Risk AnalyserFinancial Statement AuditorKPI Consistency CheckerData Room Completeness CheckerFounder-Market Fit ScorerDD Summary WriterIC Memo Drafter
πŸ’° Valuation
16 agents
Comparable Company AnalyserPrecedent Transaction FinderDCF Model BuilderRevenue Multiple BenchmarkerEntry Valuation ScorerPost-money Waterfall BuilderLiquidation Preference ModellerDilution Impact ModellerOption Pool Shuffle AnalyserPro-rata Rights TrackerSAFE Conversion ModellerBridge Note Analyser409A Reference AgentDown Round Impact ModellerMarkup/Markdown TrackerExit Multiple Benchmarker
πŸ“Š Portfolio Intelligence
18 agents
Portfolio Performance TrackerKPI Dashboard BuilderBoard Update SummariserRunway MonitorHiring Velocity TrackerCompetitive Threat MonitorCustomer Concentration TrackerChurn Alert AgentProduct Milestone TrackerFundraising Signal DetectorBridge Risk FlaggerFollow-on Decision ModellerPortfolio Diversification AnalyserIRR CalculatorMOIC TrackerWrite-off Risk ScannerLP Update DrafterPortfolio Benchmarker
🌐 Market Research
14 agents
TAM/SAM/SOM ModellerSector Trend AnalyserRegulatory Landscape ScannerCompetitive Map BuilderTechnology Adoption TrackerPublic Comp TrackerVC Activity MonitorMacro Risk AssessorEmerging Market IdentifierCustomer Pain Point ResearcherPricing Benchmark AnalyserGo-to-Market ResearcherSector Report SynthesiserThesis Validation Agent
πŸš€ Exit Intelligence
14 agents
Exit Readiness ScorerStrategic Buyer MapperFinancial Buyer IdentifierM&A Comparable FinderCIM DrafterManagement Presentation BuilderExit Proceeds ModellerDual-track Process ManagerAcquirer Relationship MapperIPO Readiness AssessorSecondary Market AnalystEarn-out Structure ModellerPost-exit Cap Table ModellerCarry Optimisation Agent
Pre-built Workflows
3
Agents per Workflow
12–24
Avg Runtime
6 min
vs 2–3 days manual
Runs Today
18
Workflow 1 β€” First-Look Screening (6 agents Β· ~4 min)
Input: pitch deck + LinkedIn URL β†’ Output: investment committee brief with deal score, thesis fit, red flags, and recommended next step.
01
Sector Fit Scorer
Scores deal against fund thesis β€” sector, stage, geography, check size
02
Founder Background
Prior exits, domain expertise, team completeness, LinkedIn credibility
03
Traction Validator
Revenue, growth rate, customer count validated against sector benchmarks
04
TAM/SAM Modeller
Bottom-up TAM validation from first principles, not deck numbers
05
Competitive Map Builder
8–12 competitors mapped, moat strength assessed
06
Deal Memo Drafter
1-page IC brief with pass/proceed recommendation synthesised
Workflow 2 β€” Full Due Diligence Package (24 agents Β· ~8 min)
Input: data room access β†’ Output: complete DD report, financial model review, red flag summary, valuation range, IC memo.
01
Financial Model Reviewer
Reviews assumptions, internal consistency, flags hockey sticks
02
Revenue Quality Analyser
ARR vs MRR, contraction, expansion, logo vs revenue churn
03
Legal Red Flag Scanner
Cap table, IP assignments, customer contracts, regulatory exposure
04
Comparable Company
Live comp set, revenue multiples, growth efficiency benchmarks
05
Entry Valuation Scorer
Bull/base/bear valuation range with methodology
06
IC Memo Drafter
Full IC memo ready for partner meeting
Workflow 3 β€” Portfolio Exit Readiness (12 agents Β· quarterly)
Runs automatically across all portfolio companies. Output: exit readiness scores, buyer universes, exit timing recommendations, proceeds modelling.
01
Exit Readiness Scorer
18-indicator scoring across financials, operations, legal, and team
02
Strategic Buyer Mapper
15–20 strategic acquirers identified with rationale and contact mapping
03
M&A Comparable Finder
Last 24 months of comparable transactions with multiples
04
Exit Proceeds Modeller
Fund carry and LP distributions modelled at every exit scenario
Deals Screened (QTD)
284
Pass Rate
6%
Quality over volume
Warm Paths Found
47
Avg Screen Time
4 min
vs 2hr manual
πŸ” Deal Sourcing β€” 18 Agents
18 sourcing agents working in parallel. Deal Flow Screener scores every inbound against fund thesis in 30 seconds β€” stage, sector, geography, check size all validated before analyst time is spent. Sector Fit Scorer, Stage Fit Validator, and Check Size Matcher eliminate 80%+ of misfit deals instantly. Warm Intro Mapper identifies shortest 2nd-degree path from the fund network to each founder via LinkedIn, portfolio company connections, LP network, and co-investor introductions. Thesis Alignment Scorer reasons about why this deal fits or doesn't β€” with specific evidence, not just a number. Deal Memo Drafter synthesises all inputs into a 1-page IC brief with a clear pass/proceed recommendation.
Active DD
4
IC Memos Ready
2
Red Flags Surfaced
7
DD Package Time
8 min
vs 3 weeks manual
πŸ“‹ Due Diligence β€” 20 Agents
20 DD agents produce a complete IC package in under 8 minutes. Financial Model Reviewer checks internal consistency and validates every assumption β€” flags hockey sticks and illogical ratios. Revenue Quality Analyser stress-tests ARR, churn, expansion, and contraction against sector benchmarks. Legal Red Flag Scanner reads every contract for IP assignment gaps, change-of-control clauses, and unusual terms. Customer Reference Interviewer structures and scores reference calls systematically. KPI Consistency Checker validates every number in the deck against the data room β€” mismatches surfaced automatically. IC Memo Drafter synthesises all outputs into a partner-ready investment memo with recommended decision and key open items.
Valuation Models
18
Comp Sets Built
24
Avg Entry Multiple
8.4Γ—
ARR Β· portfolio avg
Dilution Scenarios
156
πŸ’° Valuation β€” 16 Agents
16 valuation agents cover every angle. Comparable Company Analyser builds a live comp set from public and private market data. Precedent Transaction Finder retrieves last 24 months of M&A transactions with deal multiples. DCF Model Builder runs a full discounted cash flow with scenario sensitivity. Post-money Waterfall Builder models every share class, preference stack, and anti-dilution clause. SAFE Conversion Modeller shows exactly what converts at what price in every scenario. Down Round Impact Modeller calculates founder and early investor dilution under stress cases β€” before you sign. Option Pool Shuffle Analyser quantifies the pre-money dilution trap hidden in round mechanics.
Companies Monitored
12
Avg MOIC
3.4Γ—
Runway Alerts
1
Under 6 months
LP Updates Drafted
4
πŸ“Š Portfolio Intelligence β€” 18 Agents
18 portfolio agents monitor every company continuously. KPI Dashboard Builder ingests monthly board updates and builds a standardised metric view across the portfolio. Runway Monitor tracks cash and flags sub-6-month positions 90 days before crisis β€” giving time to act. Competitive Threat Monitor watches for new entrants, funding rounds, and pricing moves in each company's market. Fundraising Signal Detector identifies which companies are quietly raising β€” and models the follow-on decision. Write-off Risk Scanner flags early warning signs across 12 indicators. LP Update Drafter generates quarterly letters from portfolio data automatically β€” formatted, sourced, ready to send.
Sectors Monitored
12
Reports Generated
47
Emerging Themes
8
VC Moves Tracked
284
🌐 Market Research β€” 14 Agents
14 research agents keep the fund ahead of the market. TAM/SAM/SOM Modeller builds bottom-up market sizing from first principles β€” not top-down guesswork. Sector Trend Analyser monitors VC activity, regulatory shifts, and technology adoption curves across all focus sectors. Public Comp Tracker updates revenue multiples and growth benchmarks in real time. VC Activity Monitor tracks which firms are increasing or decreasing sector exposure. Thesis Validation Agent cross-references new investment theses against live market data before the first check is written. Emerging Market Identifier surfaces new categories before they're obvious β€” the signal before the noise.
Exit-Ready (score >75)
2
Buyer Universes Mapped
8
M&A Comps Built
24
CIMs Drafted
2
πŸš€ Exit Intelligence β€” 14 Agents
14 exit agents maximise outcomes across the portfolio. Exit Readiness Scorer runs 18 indicators quarterly β€” financial, legal, operational, and team readiness. Strategic Buyer Universe Mapper identifies 15–20 strategic acquirers for each company with acquisition rationale and connection paths through the fund network. M&A Comparable Finder builds a live transaction comp set with deal multiples from the last 24 months. Exit Proceeds Modeller calculates fund carry and LP distributions at every exit scenario before entering a process. CIM Drafter and Management Presentation Builder produce institutional-quality sale documents in hours. Dual-track Process Manager coordinates parallel strategic and financial buyer processes.
Agents Active
100
Decisions/Day
4,700
Accuracy (DD)
94%
GP Reviews
100%
All IC decisions
πŸ“‘ Live Agent Trace
All AI decisions logged and auditable
πŸ›‘ Investment AI Governance
Every recommendation advisory β€” GPs decide
No autonomous investment decisions: All investment, follow-on, and exit decisions require GP approval. InvestOS generates intelligence β€” partners decide and sign.
Fiduciary compliance: All DD outputs include source citations. Valuation ranges include full methodology documentation. No black-box recommendations.
Data room security: Portfolio company data processed under NDA. Agent access logs maintained. No cross-company data leakage between portfolio positions.
RAGAS evaluation: All agents evaluated on faithfulness, citation accuracy, and output quality. Faithfulness threshold: 0.92. Every recommendation traceable to source.
AgentOps β€” Live Agent Observability

πŸ“‘ Live Trace Feed

πŸ“Š Session Metrics (24h)

Total Sessions2,847
Avg Latency1.4s
P95 Latency3.1s
Error Rate0.3%
Tool Calls12,284
HITL Escalations47
RAGAS GatePASS βœ“

πŸ’° Cost & Tokens

Cost (24h)Β£847
Input Tokens48.2M
Output Tokens12.4M
Cache Hit Rate67%
Cost/SessionΒ£0.30

🎯 RAGAS Quality Scores

Faithfulness0.94 βœ“
Answer Relevance0.91 βœ“
Context Precision0.89 βœ“
Context Recall0.93 βœ“
Hallucination Rate0.8%

πŸ€– Agent Health

All agentsHealthy
OrchestratorActive
Tool registryOnline
MCP serversConnected
Memory storeHealthy
MLOps / LLMOps β€” Model Lifecycle

🧠 Model Registry

claude-sonnet-4-5 PRODUCTIONPrimary
claude-haiku-4-5 ROUTINGFast path
claude-opus-4-5 SHADOWComplex
text-embedding-3-large RAGVectors

Automatic fallback routing. Versioned in MLflow. Prompt changes require RAGAS eval gate pass.

πŸ“ˆ Drift Detection

Faithfulness drift (7d)+0.02 stable
Latency drift (7d)+120ms watch
Output length driftWithin Β±5%
Sentiment driftNo anomaly
Alert thresholdΞ”>0.05 β†’ PagerDuty

πŸ”€ A/B Experiment Controller

Prompt v2.3 vs v2.4Running
CoT vs DirectStaging

Statistical significance (p<0.05) required before promotion.

πŸͺ Feature Store

Vector IndexPinecone
Dimensions3,072
Indexed Docs284K
Retrieval P9542ms

πŸ“¦ Prompt Version Control

System promptsGit-tracked
Few-shot examplesVersioned
Eval datasetsDVC tracked
DevSecOps β€” Security-First CI/CD Pipeline

πŸš€ CI/CD Pipeline

πŸ”SAST β€” Semgrep + BanditPASS
πŸ“¦SCA β€” SBOM + TrivyPASS
πŸ§ͺUnit + Integration tests847/847
🎯RAGAS eval gate (β‰₯0.92)0.94 βœ“
πŸ”Secrets scan β€” GitleaksCLEAN
🐳Container scan β€” Grype0 CRITICAL
🚒Deploy β†’ KubernetesDEPLOYED

πŸ” Security Posture

RBAC β€” Role-based accessEnforced
API keys β€” HashiCorp VaultRotated 30d
mTLS β€” Istio service meshActive
PII scrubbing β€” NeMoActive
Audit log β€” ImmutableCloudWatch
Pen testQuarterly
SOC 2 Type IIIn progress
ISO 27001Compliant

πŸ— Infrastructure as Code

TerraformCloud infra
HelmK8s workloads
ArgoCD GitOpsSynced
Kustomize overlaysdev/stg/prd

♻️ Rollback & DR

RTO Target<15 min
RPO Target<5 min
Blue/Green DeployActive
Auto-rollbackError rate >1%

πŸ“‹ Regulatory Compliance

GDPR Art. 22 HITLEnforced
EU AI Act Art. 9Documented
NIST AI RMFMapped
ISO/IEC 42001Compliant
AI Observability β€” OpenTelemetry + Langfuse

πŸ”­ Observability Stack

L1TracesOpenTelemetry β†’ Jaeger
L2MetricsPrometheus β†’ Grafana
L3LLM TracesLangfuse (self-hosted)
L4LogsFluentd β†’ OpenSearch
L5AlertsAlertManager β†’ PagerDuty

πŸ“Š SLO Dashboard

Availability SLO99.9% target
Current (30d)99.96%
Error Budget73% remain
P50 Response0.8s
P95 Response3.1s
P99 Response7.4s

🚨 Active Alerts

Latency P95Normal
Error rate0.3% βœ“
Token budget84% remain
RAG recall0.93 βœ“
Latency drift+120ms watch

πŸ”¬ Langfuse Trace Explorer

πŸ“ˆ Avg Span Breakdown

API Gateway12ms
Auth + RBAC8ms
RAG retrieval42ms
Guardrail check18ms
LLM inference1,240ms
Tool execution84ms
Total E2E1,452ms
Guardrails β€” Responsible AI Framework

πŸ›‘ NeMo Guardrails β€” Active Rails

βœ… Human-in-the-Loop (HITL) Gate
All consequential actions require human approval before execution. Confidence <0.85 always escalates. GDPR Article 22 compliant β€” no fully automated consequential decisions.
πŸ” PII Detection & Scrubbing
Microsoft Presidio + custom patterns. Names, emails, NI/SSN, card numbers scrubbed from all LLM I/O before logging. 47 entity types across 12 jurisdictions.
🚫 Toxicity & Hallucination Filter
NeMo topic rails block off-topic responses. Factual grounding check cross-references every claim against retrieved context. Hallucination >5% triggers human review queue.
⏱ Rate Limiting & Abuse Prevention
Per-user token budgets at API gateway. 10Γ— anomalous usage triggers suspension + security alert. Cloudflare WAF DDoS protection.

πŸ“‹ Audit Trail & Explainability

πŸ“ Immutable Decision Log
Every AI recommendation logged: input context, retrieved docs, reasoning chain, confidence, model version, user ID, timestamp. 7-year retention for regulated decisions.
πŸ”Ž Explainability (XAI)
Every recommendation includes source citations, confidence intervals, alternatives considered, and limitation disclosures. SHAP attribution for structured ML models.
βš–οΈ Bias Monitoring
Fairness metrics tracked across protected characteristics. Disparate impact analysis monthly. EU AI Act Article 10 data governance requirements met.
πŸ› Regulatory Mapping
GDPR Art. 5/22 Β· EU AI Act Art. 9/10/13/14 Β· NIST AI RMF Β· ISO/IEC 42001 Β· IEEE 7001 Transparency. Compliance evidence pack generated quarterly.
0.3%
Hallucination Rate
Target <2%
100%
HITL Coverage
Consequential acts
0
PII Leaks (30d)
Target: 0
A+
Security Grade
Mozilla Observatory
Multi-Agent Architecture β€” Mesh & Orchestration

πŸ•Έ Agent Mesh Topology

Orchestrator
Agent 1
Agent 2
Agent 3
Agent 4
Agent 5
Agent 6

Orchestrator decomposes tasks, routes to specialists, aggregates results, handles conflicts. All inter-agent communication via typed schemas. No agent takes external action without Orchestrator validation.

βš™οΈ Agent Patterns

ReAct β€” Reason + Act loopsAnalytical
Reflection β€” Self-critique cyclesHigh-stakes
Planning β€” Hierarchical decompositionMulti-step
RAG β€” Retrieval-augmented genKnowledge
HITL β€” Human-in-the-loopAll consequential
Tool Use β€” Function callingAll agents

πŸ”„ Temporal.io Orchestration

Active Workflows2,847
HITL Signals Pending47
Retry PolicyExp backoff Γ—3
Saga PatternCompensating txns
Durable ExecutionCrash-safe βœ“

πŸ“¨ Kafka Message Bus

Topics47 agent topics
Throughput12K msgs/s
Consumer Lag<100ms
Schema RegistryConfluent
Dead Letter QueueMonitored

πŸ”Œ MCP Integration Layer

MCP β€” Data sourcesActive
MCP β€” CRM/ERPActive
MCP β€” Document storeActive
OAuth 2.0 authAll connectors
JSON Schema validationAll tools
Evaluation Framework β€” Continuous Quality Gates
0.94
Faithfulness
Gate β‰₯0.92 βœ“
0.91
Answer Relevance
Gate β‰₯0.88 βœ“
0.89
Context Precision
Gate β‰₯0.85 βœ“
0.93
Context Recall
Gate β‰₯0.90 βœ“

πŸ§ͺ Eval Suite Composition

Golden dataset2,847 Q&A pairs
Unit evals (per agent)120–400 cases
Integration evals84 end-to-end flows
Adversarial probes47 jailbreak tests
LLM-as-judgeclaude-opus-4-5
Human eval cadenceWeekly 5% sample

πŸ” Eval-Driven Dev Flow

1
Change proposed β†’ PR opened
Automated eval suite runs against golden dataset in CI. Results posted to PR.
2
RAGAS gate enforced
All metrics must meet thresholds. Failure blocks merge.
3
Canary deploy (5%)
Langfuse online evals on live traffic. Drift alerts trigger auto-rollback.
4
Full rollout + monitor
Weekly human eval sample. Monthly RAGAS full re-run.
Infrastructure β€” Kubernetes Β· Scale Β· Resilience

☸️ Kubernetes Cluster

ClusterEKS / GKE / AKS
Node pools3 (system Β· app Β· GPU)
HPA targetCPU 70% β†’ scale
KEDA triggersKafka consumer lag
Spot instances80% non-critical
Multi-AZ3 zones

πŸ’Ύ Data Architecture

PostgreSQL (RDS)Operational
Redis (ElastiCache)Session + cache
Pinecone / pgvectorVector search
S3 Intelligent TierDocuments
Kafka (MSK)Event streaming
Snowflake / BigQueryAnalytics DWH

πŸ’° Cost Architecture

LLM API (Anthropic)~45% of AI cost
Vector DB~12% of AI cost
Compute (K8s)~28% of AI cost
Prompt cache savingsβˆ’67% input tokens
Haiku fast-path savingβˆ’40% LLM spend
Est. monthly totalΒ£8–28K

πŸ” Disaster Recovery

1
Primary failure detected (<2 min)
Route53 health check fails β†’ DNS failover. Temporal promotes standby. Kafka MirrorMaker live.
2
DR validates (<5 min)
Smoke tests auto-run. PagerDuty alert to on-call. RTO target: 15 minutes.
3
Data reconciled (<15 min)
PostgreSQL read replica promoted. S3 cross-region lag <5min. RPO: 5 minutes.

πŸ“Š Capacity Planning

  • Baseline: 3 app nodes Β· 2 vCPU Β· 8GB RAM each
  • Scale trigger: Kafka consumer lag >10K msgs
  • Max scale: 20 nodes via KEDA + HPA
  • LLM concurrency: 50 parallel sessions managed
  • Vector search: Pinecone p1 β†’ p2 at 500K docs
  • DB connections: PgBouncer pool (max 500)
Documentation β€” Deployment Guide & Runbook

πŸš€ 10-Week Deployment Guide

1
Week 1–2: Data Foundation & Infrastructure
Deploy K8s cluster. Provision Temporal.io, Kafka, PostgreSQL, Pinecone. Connect source systems via MCP. Establish data governance and RBAC. Run baseline eval on golden dataset.
2
Week 3–4: Core Agents Live
Deploy first 3 highest-value agents. Wire HITL approval workflows in Temporal. Configure NeMo guardrails and PII scrubbing. Set up Langfuse tracing and RAGAS eval gate.
3
Week 5–7: Full Agent Mesh
Deploy all agents. Configure Orchestrator routing. A/B test prompt variants. Enable drift detection. Train end-users on HITL workflow.
4
Week 8–10: Production Hardening
Pen test + SAST/DAST scan. Load test 10Γ— baseline. Configure PagerDuty. Compliance review (GDPR, EU AI Act). Produce runbook. Go-live.

πŸ— 7-Layer Platform Stack

L7PresentationReact Β· Next.js Β· SSO
L6API GatewayFastAPI Β· OAuth2 Β· WAF
L5OrchestrationTemporal.io Β· LangGraph
L4Agent RuntimeNeMo Β· RAGAS Β· Tools
L3Model + ToolsClaude API Β· MCP servers
L2Data + IntegrationKafka Β· PostgreSQL Β· Redis
L1ObservabilityOTel Β· Langfuse Β· Grafana

πŸ”Œ Integration How-To

  • MCP server per data source (REST/GraphQL/gRPC)
  • OAuth 2.0 service account per enterprise system
  • Kafka topics per agent capability namespace
  • Schema registry for typed message contracts
  • Data lineage via OpenLineage β†’ Marquez
  • Webhooks for real-time event ingestion
  • dbt + Airflow for batch data refresh

πŸ‘€ RBAC User Roles

ViewerRead dashboards
AnalystRun queries + export
ApproverHITL decisions
ManagerConfig + agents
AdminFull platform
AI EngineerModels + prompts

IdP via Okta/Azure AD. MFA enforced for Approver+.

πŸ“ž Incident Runbook

  • High latency (>5s): Check Langfuse trace β†’ vector store β†’ LLM API status
  • RAGAS gate fail: Roll back last prompt change β†’ notify AI engineer
  • Error spike: Circuit breaker β†’ fallback to previous version
  • PII leak: Suspend session β†’ DPO notification within 24h
  • HITL queue backup: Escalate to senior approver
  • Cost overrun: Auto-throttle β†’ route to Haiku