Gemini Live Agent Challenge: Submission Readiness
When building Digital Persona, our goal wasn't just to experiment—it was to create a robust, production-ready system that fundamentally answers the core prompt of the Gemini Live Agent Challenge: moving beyond the text box to create an immersive, real-time experience.
This document serves as our definitive checklist to ensure our project meets and exceeds all Devpost submission requirements and judging criteria.
1. Core Technological Requirements
Our system is engineered specifically for the "Live Agents" category, focusing on real-time, interruptible audio/vision interaction.
- Leverage a Gemini Model: We utilize the native multimodal capabilities of Gemini 2.5 Flash.
- Use Google GenAI SDK or ADK: The core brain is built using the official Google GenAI SDK to establish the continuous WebSocket connection.
- Use Google Cloud Services: The entire application backend is containerized and hosted securely on Google Cloud Run.
2. Devpost Submission Deliverables
We have meticulously prepared the required assets for the Devpost platform:
- Text Description: Our documentation provides a comprehensive summary of features, technologies used, and our specific findings regarding UI latency and visual grounding.
- Public Code Repository: The repository is public at github.com/Kshitijm7/digital-persona.
- Crucial Requirement: We have included detailed spin-up instructions in our repository guide and root
README.mdto ensure judges can reproduce the project locally.
- Crucial Requirement: We have included detailed spin-up instructions in our repository guide and root
- Proof of Google Cloud Deployment: We have secured a live URL (digital-persona-798468384002.us-central1.run.app) and will include UI console screenshots/logs in the Devpost image carousel verifying the Cloud Run deployment.
- Architecture Diagram: A clear visual representation showing the client, the ephemeral token API, and the Gemini WebSocket connection is available in our Architecture section.
- Demonstration Video: We are producing a <4-minute video that:
- Pitches our solution (Embodied, empathetic AI vs static chatbots).
- Demonstrates our multimodal features working fully in real-time (no mockups).
- Highlights the natural barge-in/interruption capability.
3. Addressing the Judging Criteria
As we finalize our submission, we are specifically targeting the three main pillars of the challenge:
- Innovation & Multimodal User Experience (40%): Does the agent help "See, Hear, and Speak" seamlessly? Absolutely. Digital Persona completely abandons the text box paradigm. You do not type; you talk, and it listens, adopting distinct persona modes (Tutor, Guide) with an emotive 3D body.
- Technical Implementation (30%): Is the backend robust? Yes. By utilizing an ephemeral token strategy deployed on Cloud Run, we protect API keys while maintaining the low-latency direct WebSocket connection necessary for the GenAI SDK.
- Demo & Presentation (30%): Our documentation site (what you are reading now) and our upcoming video act together as a comprehensive, highly transparent presentation of our architecture and solution.
4. Bonus Point Initiatives
We believe in going above and beyond the baseline requirements. To capture Devpost bonus points, we have completed the following:
- Publish Content: We authored a detailed, deeply technical blog post explaining how we orchestrated the Gemini Live API WebSockets and tool-calling. (Hashtag: #GeminiLiveAgentChallenge)
We are incredibly excited to present Digital Persona. It represents exactly what we believe AI should be: human-centric, responsive, and deeply grounded.