Part 4 — Reference Architecture: The Seven Planes
How do you organise the system so that each concern has a clear owner, a defined boundary, and a deployable footprint?
7 min · Updated June 2026
A production multimodal hybrid agentic RAG system organises into seven conceptually distinct planes. Each plane has a clear ownership boundary, a defined data contract, and a separate deployment footprint.
The seven planes

The model gateway (LiteLLM) sits horizontally across all planes: every generation, embedding, reranking, and vision API call routes through it, providing unified cost accounting, per-role model routing, cross-provider fallback, and prompt caching.
Control plane
The configuration and policy layer. It holds per-tenant settings, per-role prompt templates, feature flags that enable or disable agentic patterns, and the immutable audit log required for compliance. Nothing in this plane does retrieval or generation; it governs what the other planes are allowed to do.
Observability plane
The measurement layer. It receives OpenTelemetry spans from every other plane, stores them in Langfuse for LLM-native trace analysis, feeds embedding drift metrics to Arize Phoenix, and runs DeepEval/Ragas CI gates on every pipeline change. Crucially, this plane is not optional instrumentation — the observability signals are the inputto the routing decisions in the Ingestion and Retrieval planes. Removing it breaks those planes’ ability to self-correct.
Ingestion plane
The accuracy-determination layer. Documents enter here as raw files; they leave as validated, confidence-graded, contextually enriched, deduplicated chunks ready for indexing. This is where the majority of production accuracy is actually won or lost. The ingestion accuracy patterns are covered in full in Part 5.
Tooling plane
The external capability registry. When the knowledge base cannot answer a query (stale data, missing content, structured numeric lookups), the Agentic Plane invokes tools registered here: web search, SQL execution, HTTP fetch, on-demand audio transcription. Tools run in sandboxed environments with per-tenant allow-lists.
Indexing plane
The storage layer. Qdrant holds dense and sparse vectors with payload-level ACL filters. Kuzu holds the knowledge graph for multi-hop traversal. RAPTOR recursive summaries enable global thematic queries. Tenant isolation is enforced here — not post-hoc — so a query can never retrieve documents outside the requesting tenant’s access scope.
Retrieval plane
The search layer. It executes hybrid queries (dense + BM25 + RRF), reranks with a cross-encoder, applies query transformations (HyDE, multi-query, step-back), compresses and reorders the context window, and returns a scored, ordered candidate set to the Agentic Plane. The eight retrieval failure modes and nine accuracy patterns are covered in full in Part 6.
Agentic plane
The orchestration layer. A LangGraph 1.0 StateGraph that routes queries, decomposes compound questions, fans out parallel retrievals, grades retrieved context (CRAG), verifies generated answers (Self-RAG), enforces citation, and bounds correction loops. It is not a pipeline — it is a graph with conditional edges and termination conditions. The seven agentic accuracy patterns are covered in full in Part 7.