Luis Sala

Money management is personal. Typing into a chatbox feels transactional. In the recent AI Agent Build-Off, we didn’t just want an agent that reads your data; we wanted one you could talk to—live, face-to-face, with zero latency.

We built a Multimodal Live Streaming Agent using Google’s Agent Development Kit (ADK). Here’s a look at the code and architecture that made it work.

The Challenge: Real-Time vs. Real-Life

Most AI agents operate on a simple request-response model: you type, wait, and get text back. But for a “Financial Avatar,” we needed:

Bi-directional Audio: You talk, it interrupts and responds instantly.
Multimodal Context: The agent sees what you see (via webcam/screen share) to analyze charts or documents together.
Resilience: It had to handle network jitter without “stuttering.”

Technical Architecture

Our repo isn’t just a set of prompts; it’s a full-stack streaming application.

1. The Audio Pipeline (`audio-client.js`)

We couldn’t rely on standard browser recording. We implemented a custom AudioWorklet to process raw PCM audio.

Input: Captures microphone audio at 16kHz.
Output: Receives 24kHz audio chunks from the server.
Adaptive Buffering: The code includes a “Gap Detection” algorithm. If it detects network instability (e.g., a 300ms delay), it dynamically adjusts the buffer size (adaptiveThreshold) to prevent audio glitches, trading milliseconds of latency for smoothness.

2. The Nervous System: Priority Queues

Handling audio, video, and data simultaneously creates a traffic jam. We built a robust Message Queue System (message-queue-manager.js) with a Strategy Pattern for overflow:

Audio (Critical): Uses a DROP_OLDEST strategy. If the network lags, old audio packets are dropped to keep the conversation in the “now.”
Video (Visual Context): Uses REPLACE_NEWEST. We only care about the current frame; if the queue is full, we overwrite the previous frame.
Text (Data): Uses FAIL_SEND. We never drop financial data; if it can’t send, we alert the user.

3. State Management

We moved away from spaghetti code by implementing the Observer Pattern (state-manager.js). The UI, the WebSocket connection, and the audio processor all subscribe to a central state. When the agent switches from “Listening” to “Processing,” every component reacts instantly without tight coupling.

Key Features

Live Financial Consultations

Instead of just querying a database, the agent acts as a live consultant. You can share your screen showing a stock dashboard, and the agent—using the ADK’s multimodal capabilities—can analyze the trend line in real-time and offer advice verbally.

Resilience Monitoring

We built a custom debugging suite (debug-monitor.js) that runs in the browser console. It exposes real-time metrics like audioPacketsReceived, bufferHealth, and latency. This was crucial for debugging the “ghost in the machine” moments where audio would drift out of sync.

Watch the Build

Engineering Takeaways

Building a real-time system is exponentially harder than a text bot.

Latency is the Enemy: Every millisecond counts. We had to optimize the AudioContext scheduling loop to ensure seamless playback.
Fail Gracefully: Networks are imperfect. Our Queue Health Monitoring system ensures that if the connection degrades, the agent knows it and can pause rather than hallucinating or crashing.
Separation of Concerns: The modular architecture (AppController → MultimodalClient → AudioWorklet) allowed us to swap out audio processing logic without breaking the UI.

The future of FinTech isn’t just smarter algorithms; it’s presence. With the ADK, we’re building agents that feel less like tools and more like teammates.

Check out the code on GitHub

Menu

Beyond Chatbots: Building a Real-Time Financial Avatar with Google ADK

The Challenge: Real-Time vs. Real-Life

Technical Architecture

1. The Audio Pipeline (`audio-client.js`)

2. The Nervous System: Priority Queues

3. State Management

Key Features

Live Financial Consultations

Resilience Monitoring

Watch the Build

Engineering Takeaways

Index / Topics

Menu

The Challenge: Real-Time vs. Real-Life

Technical Architecture

1. The Audio Pipeline (audio-client.js)

2. The Nervous System: Priority Queues

3. State Management

Key Features

Live Financial Consultations

Resilience Monitoring

Watch the Build

Engineering Takeaways

Index / Topics

1. The Audio Pipeline (`audio-client.js`)