Multi-Agent Systemsin-progress

NexiFlowAI - AI Presenter Backend

Low-latency voice interaction backend for the NexiFlowAI investor-demo copilot. Features human-supervised meeting assistance, explicit policy controls, replayable session state, and local MCP tool integrations.

PythonFastAPIGoogle GeminiLiteLLMFastMCPWebSockets

NexiFlowAI - AI Presenter Backend

AI Presenter designed to handle investor inquiries with low-latency voice interaction. It is the backend for the NexiFlowAI investor-demo copilot. The system is designed as a human-supervised meeting assistant with explicit policy controls, replayable session state, grounded company evidence, and a server-proxied live voice path.

Current State

The redesign is partially implemented and already usable for backend text workflows.

Shared orchestration now routes text turns, SSE streaming, and live-context building through one canonical pipeline.
MCP is the preferred evidence path when enabled, with direct evidence fallback behind the same contract.
REST text turns, SSE streaming, operator controls, session state, and the local MCP server are working.
Voice turns are buffered per turn, profiled for speaker metadata, and converted into approved context for Gemini Live.
Real LiteLLM text generation has been validated.
Real Gemini Live is not fully validated end to end yet. As of April 13, 2026, the provider still fails after the backend injects post-audio context.

What The Backend Does

Creates and manages live investor sessions
Tracks conversation mode and operator overrides
Compiles approved company knowledge from `app/knowledge/docs/company.md`
Applies policy checks before and after generation
Retrieves evidence through the internal evidence service or local FastMCP tools
Serves text replies over REST and SSE
Proxies live audio through a backend WebSocket route
Profiles turn audio for speaker continuity and an acoustic gender estimate
Records replayable session, turn, profile, and metric events in memory

Core Runtime

Text path

1. A user turn reaches `POST /api/sessions/{session_id}/turns/text` or the SSE variant.

2. `Orchestrator.prepare_turn()` runs the shared state-machine and policy path once.

3. Evidence is collected through the configured `EvidenceProvider`.

4. The system either returns an FAQ answer, a policy fallback, or a constrained LiteLLM completion.

5. The reply is validated, recorded, and exposed with citations and trace metadata.

Voice path

1. A client connects to `WS /ws/session/{session_id}`.

2. Raw PCM audio chunks are forwarded to Gemini Live and buffered locally by turn.

3. On `end_of_turn`, the server snapshots the turn audio, prepares the turn, and runs two tasks in parallel:

speaker profiling over the buffered PCM
`build_live_context_from_turn()` for approved evidence and policy context

4. The approved context is injected into the Gemini Live session.

5. Gemini Live is expected to generate the audio reply.

The backend side of that flow is implemented. The remaining instability is on the real provider sequence after context injection.

Setup

1. Create and activate a virtual environment.

2. Install dependencies:

pip install -r requirements.txt

3. Copy the environment template:

Copy-Item .env.example .env

4. Review these defaults in `.env` if you want the redesign defaults:

`ENABLE_MOCK_LLM=true` for local/backend-only validation
`ENABLE_MCP_SERVER=true`
`EVIDENCE_PROVIDER=fastmcp_local`
`MCP_CLIENT_TIMEOUT_MS=1500`
`LITELLM_MODEL=investor_text_primary`
`GEMINI_LIVE_MODEL=gemini-3.1-flash-live-preview`

The redesign no longer hides the normal evidence path behind one opaque MCP call. The main provider now dispatches explicit tool calls for `session_summary`, `faq_lookup`, and `company_search`, then rebuilds the `ContextPacket` locally.

Validation status

As of April 12, 2026:

mock-mode backend validation passed for text turns, SSE, operator controls, MCP, and the backend voice orchestration seam

As of April 13, 2026:

real LiteLLM text generation was validated successfully
MCP and SSE behaved correctly in the real run
direct Gemini Live probing still failed
the backend live WebSocket reached `context.injected` and then failed after the combined audio plus context sequence

Technology Manifest

Python

FastAPI

Google Gemini

LiteLLM

FastMCP

WebSockets