RunOlga Docs

Acceptance criteria

The ten checks that define when Olga Phase 1 work is complete and correct.

Phase 1

This is part of what RunOlga delivers in Phase 1.

These criteria define done. They turn the Phase 1 promises — Olga's voice, her four capabilities, her guardrails, and her grounding — into checks someone can verify against the live GCS instance.

All ten, or not done

Phase 1 Olga work is not complete until all ten criteria are verifiably met. Each is a check, not an aspiration.

The ten criteria

  1. Voice fidelity. Olga responds to the canonical Sample Interactions (Part 1, Examples 1.1–1.10) with responses substantially matching the targets — same structure, same tone, same conciseness, no anti-pattern violations. See Olga's voice.

  2. Edge case handling. Olga handles the edge case examples (Part 2, Examples 2.1–2.12) appropriately — declining out-of-scope requests honestly, respecting RBAC, and accepting user judgment without defensiveness.

  3. Four capabilities functional. Each of the four Phase 1 capabilities — conversational data entry, basic retrieval, setup clarification, and priority-to-task decomposition — works end-to-end against GCS's instance data.

  4. Grounding verified. Every Olga response can be traced back to its source — Layer 1, Layer 2, or Live Context — via provenance logs. See how Olga works and grounding and confirmation.

  5. RBAC respected. A user authenticated as a Viewer cannot extract data via Olga that they cannot extract via the UI. Tested with at least three role/permission combinations. See permissions and isolation.

  6. Instance isolation verified. A user authenticated to GCS's instance cannot, through any Olga query or prompt manipulation, access another instance's data.

  7. Confirmation enforced. No database write occurs without explicit user confirmation in the conversation. See grounding and confirmation.

  8. Latency targets met. Simple queries respond in under 2 seconds, and multi-store composition queries under 4 seconds, measured on production infrastructure under realistic load.

  9. Conversation continuity. A user can navigate between components without losing conversation state. Olga's context updates with each navigation.

  10. Voice review passed. Ray reviews Olga's responses across a representative session and signs off on voice fidelity before Phase 1 go-live.

How the ten group together

The criteria check four kinds of correctness. Reading them this way shows what each one protects.

GroupCriteriaWhat it confirms
Voice and behavior1, 2, 10Olga sounds like herself, handles the hard cases honestly, and passes human review.
Capability3The four Phase 1 capabilities actually work against real GCS data, not in demo.
Trust and safety4, 5, 6, 7Grounding, RBAC, instance isolation, and confirmation hold on every interaction.
Performance and continuity8, 9Olga is fast under real load and keeps context as you move through the platform.

Criteria 1, 2, and 10 are judged against the canonical Sample Interactions and a representative session, not a written rubric alone — voice is verified by reading Olga's actual responses. Criteria 4 through 7 are the guardrails made testable: each is a guarantee that must be demonstrable, not assumed.

On this page