Chat Message: End-to-End Runtime Flow
This page traces a single streaming chat message from the browser click to the last token rendered on screen, annotating each hop with DB queries, latency expectations, and failure modes.
Full Sequence Diagram
DB Query Count Per Request
| Service | Query | Purpose |
|---|---|---|
| Rust #1 | sessions | Authentication + role resolution |
| Rust #2 | conversations | Ownership verification |
| Python #1 | conversations | gRPC-side ownership (redundant by design) |
| Python #2 | chat_messages INSERT | Persisting user message |
| Python #3 | user_memories SELECT | Loading user context |
| Python #4 | hybrid_search() | Vector + full-text retrieval |
| Python #5 | chat_messages INSERT | Persisting assistant message |
| Python #6 | user_memories UPSERT | Updating user memory |
Total: 2 Rust queries + 6 Python queries = 8 DB round-trips per streaming chat message
This is the steady-state count. First message in a new conversation adds one INSERT for the conversation row.
Latency Budget
| Segment | Typical | Worst Case |
|---|---|---|
| Browser → Next.js proxy | <1 ms | 1 ms |
| auth_middleware DB lookup | 2–5 ms | 20 ms |
| Ownership check DB lookup | 2–5 ms | 20 ms |
| gRPC channel (tonic) | 0.5 ms | 2 ms |
| Python conversation + save | 5–15 ms | 50 ms |
| Memory retrieval | 2–5 ms | 20 ms |
| Hybrid search (HNSW + GIN) | 15–50 ms | 200 ms |
| Prompt construction | <1 ms | 1 ms |
| Time to first token | ~30–90 ms | ~300 ms |
| Per-token delivery (LLM) | 20–50 ms | 200 ms |
| Post-generation saves | 5–10 ms | 50 ms |
Failure Modes
SSE error handling in browser: The chat store listens for SSE error events via EventSource.onerror. Mid-stream errors append an error token to the assistant message and set state.error in the Zustand chat store.
Rust → Python Data Fidelity
The Rust API passes user_id and conversation_id as plain strings — no type or ownership validation occurs in the Protobuf layer. Python performs its own SELECT to validate ownership against the conversations table, creating a duplicate ownership check across the language boundary. This is intentional defensive programming: the Python service cannot trust that all future callers will have performed the same validation.
However, there is a subtle inconsistency: the conversations.user_id column is VARCHAR without a foreign key to users.id. Python can insert conversations for users that do not exist in the users table. If such a “ghost” conversation is ever retrieved, the Rust API’s JOIN with users will find no matching row and the conversation will be silently omitted from list results.
Token Streaming from Browser’s Perspective
// client/web/components/ai-elements/chat-area/index.tsx (simplified)
const eventSource = new EventSource(`/api/chat/conversations/${id}/stream`, {
// fetch-based SSE with POST body
});
eventSource.onmessage = (event) => {
const chunk: ChatStreamChunk = JSON.parse(event.data);
if (chunk.token) {
appendToken(chunk.token);
}
if (chunk.is_final) {
setSources(chunk.sources);
setMetrics(chunk.metrics);
eventSource.close();
}
};
eventSource.addEventListener('error', () => {
setError('Stream interrupted');
eventSource.close();
});The Zustand chat-store.ts manages the streamingMessageId state — while a message is streaming, the store marks it with a dedicated ID. When is_final arrives, the store transitions the message from streaming to complete, triggering re-render with the full message content and sources.