Acceptance criteria
The ten checks that define when Olga Phase 1 work is complete and correct.
Phase 1
These criteria define done. They turn the Phase 1 promises — Olga's voice, her four capabilities, her guardrails, and her grounding — into checks someone can verify against the live GCS instance.
All ten, or not done
The ten criteria
-
Voice fidelity. Olga responds to the canonical Sample Interactions (Part 1, Examples 1.1–1.10) with responses substantially matching the targets — same structure, same tone, same conciseness, no anti-pattern violations. See Olga's voice.
-
Edge case handling. Olga handles the edge case examples (Part 2, Examples 2.1–2.12) appropriately — declining out-of-scope requests honestly, respecting RBAC, and accepting user judgment without defensiveness.
-
Four capabilities functional. Each of the four Phase 1 capabilities — conversational data entry, basic retrieval, setup clarification, and priority-to-task decomposition — works end-to-end against GCS's instance data.
-
Grounding verified. Every Olga response can be traced back to its source — Layer 1, Layer 2, or Live Context — via provenance logs. See how Olga works and grounding and confirmation.
-
RBAC respected. A user authenticated as a Viewer cannot extract data via Olga that they cannot extract via the UI. Tested with at least three role/permission combinations. See permissions and isolation.
-
Instance isolation verified. A user authenticated to GCS's instance cannot, through any Olga query or prompt manipulation, access another instance's data.
-
Confirmation enforced. No database write occurs without explicit user confirmation in the conversation. See grounding and confirmation.
-
Latency targets met. Simple queries respond in under 2 seconds, and multi-store composition queries under 4 seconds, measured on production infrastructure under realistic load.
-
Conversation continuity. A user can navigate between components without losing conversation state. Olga's context updates with each navigation.
-
Voice review passed. Ray reviews Olga's responses across a representative session and signs off on voice fidelity before Phase 1 go-live.
How the ten group together
The criteria check four kinds of correctness. Reading them this way shows what each one protects.
| Group | Criteria | What it confirms |
|---|---|---|
| Voice and behavior | 1, 2, 10 | Olga sounds like herself, handles the hard cases honestly, and passes human review. |
| Capability | 3 | The four Phase 1 capabilities actually work against real GCS data, not in demo. |
| Trust and safety | 4, 5, 6, 7 | Grounding, RBAC, instance isolation, and confirmation hold on every interaction. |
| Performance and continuity | 8, 9 | Olga is fast under real load and keeps context as you move through the platform. |
Criteria 1, 2, and 10 are judged against the canonical Sample Interactions and a representative session, not a written rubric alone — voice is verified by reading Olga's actual responses. Criteria 4 through 7 are the guardrails made testable: each is a guarantee that must be demonstrable, not assumed.