Beyond Chatbots: Building a Real-Time Financial Avatar with Google ADK


Money management is personal. Typing into a chatbox feels transactional. In the recent AI Agent Build-Off, we didn’t just want an agent that reads your data; we wanted one you could talk to—live, face-to-face, with zero latency.
We built a Multimodal Live Streaming Agent using Google’s Agent Development Kit (ADK). Here’s a look at the code and architecture that made it work.
The Challenge: Real-Time vs. Real-Life
Most AI agents operate on a simple request-response model: you type, wait, and get text back. But for a “Financial Avatar,” we needed:
- Bi-directional Audio: You talk, it interrupts and responds instantly.
- Multimodal Context: The agent sees what you see (via webcam/screen share) to analyze charts or documents together.
- Resilience: It had to handle network jitter without “stuttering.”
Technical Architecture
Our repo isn’t just a set of prompts; it’s a full-stack streaming application.
1. The Audio Pipeline (audio-client.js)
We couldn’t rely on standard browser recording. We implemented a custom AudioWorklet to process raw PCM audio.
- Input: Captures microphone audio at 16kHz.
- Output: Receives 24kHz audio chunks from the server.
- Adaptive Buffering: The code includes a “Gap Detection” algorithm. If it detects network instability (e.g., a 300ms delay), it dynamically adjusts the buffer size (
adaptiveThreshold) to prevent audio glitches, trading milliseconds of latency for smoothness.
2. The Nervous System: Priority Queues
Handling audio, video, and data simultaneously creates a traffic jam. We built a robust Message Queue System (message-queue-manager.js) with a Strategy Pattern for overflow:
- Audio (Critical): Uses a
DROP_OLDESTstrategy. If the network lags, old audio packets are dropped to keep the conversation in the “now.” - Video (Visual Context): Uses
REPLACE_NEWEST. We only care about the current frame; if the queue is full, we overwrite the previous frame. - Text (Data): Uses
FAIL_SEND. We never drop financial data; if it can’t send, we alert the user.
3. State Management
We moved away from spaghetti code by implementing the Observer Pattern (state-manager.js). The UI, the WebSocket connection, and the audio processor all subscribe to a central state. When the agent switches from “Listening” to “Processing,” every component reacts instantly without tight coupling.
Key Features
Live Financial Consultations
Instead of just querying a database, the agent acts as a live consultant. You can share your screen showing a stock dashboard, and the agent—using the ADK’s multimodal capabilities—can analyze the trend line in real-time and offer advice verbally.
Resilience Monitoring
We built a custom debugging suite (debug-monitor.js) that runs in the browser console. It exposes real-time metrics like audioPacketsReceived, bufferHealth, and latency. This was crucial for debugging the “ghost in the machine” moments where audio would drift out of sync.
Watch the Build
Engineering Takeaways
Building a real-time system is exponentially harder than a text bot.
- Latency is the Enemy: Every millisecond counts. We had to optimize the
AudioContextscheduling loop to ensure seamless playback. - Fail Gracefully: Networks are imperfect. Our Queue Health Monitoring system ensures that if the connection degrades, the agent knows it and can pause rather than hallucinating or crashing.
- Separation of Concerns: The modular architecture (
AppController→MultimodalClient→AudioWorklet) allowed us to swap out audio processing logic without breaking the UI.
The future of FinTech isn’t just smarter algorithms; it’s presence. With the ADK, we’re building agents that feel less like tools and more like teammates.