RFC-0002: AEGIS Governance Runtime Specification

RFC: RFC-0002
Status: Final (v1.0)
Frozen: 2026-03-26
Version: 0.2
Created: 2026-03-05
Updated: 2026-03-06
Author: AEGIS Initiative, AEGIS Operations LLC
Repository: aegis-governance
Target milestone: v1.0
Supersedes: None
Superseded by: None


Summary

This RFC specifies the runtime APIs, state model, error behavior, deployment topology, and performance expectations for the AEGIS Governance Runtime: the component responsible for accepting action proposals, evaluating them against governance controls, and enforcing decisions at the execution boundary.


Motivation

RFC-0001 defines what the governance architecture must do. This RFC defines how it behaves at runtime. Without a concrete API surface, state model, and error specification, implementations cannot be validated for compliance and behavior under failure conditions cannot be reasoned about.


Guide-Level Explanation

The Governance Runtime is the operational heart of AEGIS. It is the process that receives action proposals from AI agents, runs them through the decision pipeline, and either permits execution, blocks it, or escalates it for human review.

From an operator’s perspective: you deploy the runtime alongside your AI systems, configure it with a capability registry and policy set, and it becomes the mandatory checkpoint for all agent actions. Nothing reaches your infrastructure without passing through it.


Reference-Level Explanation

1. Runtime Responsibilities

2. Runtime Architecture

flowchart TD
    A[Agent Client] --> B[Governance Gateway API]
    B --> C[Decision Engine]
    C --> D[Capability Registry]
    C --> E[Policy Engine]
    C --> F[Risk Engine]
    C --> G[Audit System]
    C --> H[Tool Proxy Layer]
    H --> I[External Systems]

3. API Surface

Submit Action — POST /aegis/actions

Request:

{
"request_id": "uuid-v4",
"actor_id": "agent:soc-001",
"capability": "telemetry.query",
"action_type": "tool_call",
"target": "siem.search",
"parameters": {
  "query": "failed_login > 10",
  "window": "15m"
},
"context": {
  "session_id": "sess-001",
  "environment": "production",
  "trace_id": "trace-abc",
  "timestamp": "2026-03-05T12:00:00Z"
}
}

Response:

{
  "request_id": "uuid-v4",
  "decision": "ALLOW",
  "reason": "Approved by policy soc_query_allow",
  "audit_id": "audit-6f4f",
  "conditions": ["max_results=500", "timeout_ms=10000"],
  "timestamp": "2026-03-05T12:00:00Z"

}

Retrieve Audit Record — GET /aegis/audit/{audit_id}

Returns immutable decision and evaluation trace.

Health — GET /healthz | GET /readyz

Readiness fails if policy, capability, or audit stores are unavailable.

4. Error Handling

{
  "error_code": "INVALID_ACTION_TYPE",
  "message": "action_type must be one of [tool_call, file_read, ...]",
  "request_id": "uuid-v4",
  "retryable": false,
  "timestamp": "2026-03-05T12:00:01Z"
}
CodeHTTPRetryableSource
INVALID_REQUEST400NoGateway validation
UNAUTHORIZED_CAPABILITY403NoCapability check
POLICY_EVALUATION_ERROR500MaybePolicy engine
AUDIT_PERSIST_ERROR503YesAudit system
UPSTREAM_TIMEOUT504YesTool proxy

5. Runtime State Model

stateDiagram-v2
    [*] --> Received
    Received --> Rejected: schema invalid
    Received --> Validated: schema valid
    Validated --> Evaluating
    Evaluating --> Denied
    Evaluating --> Escalated
    Evaluating --> Approved
    Approved --> Executing
    Executing --> Completed
    Executing --> Failed
    Denied --> [*]
    Escalated --> [*]
    Completed --> [*]
    Failed --> [*]
    Rejected --> [*]

6. Performance Requirements

MetricTarget
p50 decision latency<= 20ms
p95 decision latency<= 75ms
p99 decision latency<= 150ms
Audit write success>= 99.99%
Single-node throughput500 actions/sec
Horizontal target10k actions/sec

7. Deployment Architecture

flowchart LR
    LB[Ingress/LB] --> GW1[Gateway Pod A]
    LB --> GW2[Gateway Pod B]
    GW1 --> DE1[Decision Service]
    GW2 --> DE1
    DE1 --> CR[(Capability Store)]
    DE1 --> PR[(Policy Store)]
    DE1 --> AR[(Audit Store)]
    DE1 --> TP[Tool Proxy Workers]
    TP --> EXT[External Systems]

Requirements: least-privilege service identities,1 mTLS between components, isolated execution network for proxy workers, immutable config snapshots per runtime version.

8. Failure Behavior


Drawbacks


Alternatives Considered

Inline evaluation in the agent process: Eliminates network overhead but allows the agent to bypass governance by modifying its own evaluation logic. Violates the non-bypass guarantee.

Asynchronous post-execution audit: Reduces latency but provides no enforcement. Governance that operates after execution is documentation, not control.

Single-tier runtime without proxy workers: Simpler to deploy but conflates the governance decision path with the execution path, complicating isolation guarantees.


Compatibility

Downstream of RFC-0001. No breaking changes to RFC-0001 architecture. All RFC-0001 security guarantees are preserved by this specification.


Implementation Notes

Implementers should begin with the API surface and state model. Performance targets are aspirational for v0.x and binding at v1.0.


Open Questions


Success Criteria


References


AEGIS™ | “Capability without constraint is not intelligence”™
AEGIS Initiative — AEGIS Operations LLC

Footnotes

  1. National Institute of Standards and Technology, Zero Trust Architecture, NIST SP 800-207, Aug. 2020. [Online]. Available: https://doi.org/10.6028/NIST.SP.800-207. See REFERENCES.md.