NexiFlowAI - AI Presenter Backend
Low-latency voice interaction backend for the NexiFlowAI investor-demo copilot. Features human-supervised meeting assistance, explicit policy controls, replayable session state, and local MCP tool integrations.
Project Overview
NexiFlowAI - AI Presenter Backend
AI Presenter designed to handle investor inquiries with low-latency voice interaction. It is the backend for the NexiFlowAI investor-demo copilot. The system is designed as a human-supervised meeting assistant with explicit policy controls, replayable session state, grounded company evidence, and a server-proxied live voice path.
Current State
The redesign is partially implemented and already usable for backend text workflows.
- Shared orchestration now routes text turns, SSE streaming, and live-context building through one canonical pipeline.
- MCP is the preferred evidence path when enabled, with direct evidence fallback behind the same contract.
- REST text turns, SSE streaming, operator controls, session state, and the local MCP server are working.
- Voice turns are buffered per turn, profiled for speaker metadata, and converted into approved context for Gemini Live.
- Real LiteLLM text generation has been validated.
- Real Gemini Live is not fully validated end to end yet. As of April 13, 2026, the provider still fails after the backend injects post-audio context.
What The Backend Does
- Creates and manages live investor sessions
- Tracks conversation mode and operator overrides
- Compiles approved company knowledge from `app/knowledge/docs/company.md`
- Applies policy checks before and after generation
- Retrieves evidence through the internal evidence service or local FastMCP tools
- Serves text replies over REST and SSE
- Proxies live audio through a backend WebSocket route
- Profiles turn audio for speaker continuity and an acoustic gender estimate
- Records replayable session, turn, profile, and metric events in memory
Core Runtime
Text path
1. A user turn reaches `POST /api/sessions/{session_id}/turns/text` or the SSE variant.
2. `Orchestrator.prepare_turn()` runs the shared state-machine and policy path once.
3. Evidence is collected through the configured `EvidenceProvider`.
4. The system either returns an FAQ answer, a policy fallback, or a constrained LiteLLM completion.
5. The reply is validated, recorded, and exposed with citations and trace metadata.
Voice path
1. A client connects to `WS /ws/session/{session_id}`.
2. Raw PCM audio chunks are forwarded to Gemini Live and buffered locally by turn.
3. On `end_of_turn`, the server snapshots the turn audio, prepares the turn, and runs two tasks in parallel:
- speaker profiling over the buffered PCM
- `build_live_context_from_turn()` for approved evidence and policy context
4. The approved context is injected into the Gemini Live session.
5. Gemini Live is expected to generate the audio reply.
The backend side of that flow is implemented. The remaining instability is on the real provider sequence after context injection.
Setup
1. Create and activate a virtual environment.
2. Install dependencies:
pip install -r requirements.txt
3. Copy the environment template:
Copy-Item .env.example .env
4. Review these defaults in `.env` if you want the redesign defaults:
- `ENABLE_MOCK_LLM=true` for local/backend-only validation
- `ENABLE_MCP_SERVER=true`
- `EVIDENCE_PROVIDER=fastmcp_local`
- `MCP_CLIENT_TIMEOUT_MS=1500`
- `LITELLM_MODEL=investor_text_primary`
- `GEMINI_LIVE_MODEL=gemini-3.1-flash-live-preview`
The redesign no longer hides the normal evidence path behind one opaque MCP call. The main provider now dispatches explicit tool calls for `session_summary`, `faq_lookup`, and `company_search`, then rebuilds the `ContextPacket` locally.
Validation status
As of April 12, 2026:
- mock-mode backend validation passed for text turns, SSE, operator controls, MCP, and the backend voice orchestration seam
As of April 13, 2026:
- real LiteLLM text generation was validated successfully
- MCP and SSE behaved correctly in the real run
- direct Gemini Live probing still failed
- the backend live WebSocket reached `context.injected` and then failed after the combined audio plus context sequence