The Rise of AI-powered voice interfaces

At the recent We ♥ Web event, Dave Bitter, a Senior Front-end Consultant, Developer Advocate, and Google Developer Expert for Web, shared his insights on AI-powered voice interfaces.

Dave’s session, took place from 13:00 - 13:45 in MLH01A30. Dave discussed his journey of using AI in web projects and how he explored implementing AI to enhance websites. This topic has been a passion of his for over seven years, dating back to his time as a student.

Highlights of Dave’s Projects

PresiParrot

Dave created PresiParrot as a tool to experiment with voice recognition and completion. While it’s an older project and not perfect, it showcases how web technologies can be used for voice interactions.

Voice Recognition Evolution

1970s–1980s: Early attempts to improve voice recognition.
1990s: Larger projects started incorporating this technology.
2000s: Google utilized voice recognition for search capabilities.
2010s: Voice assistants like Siri and Google Assistant normalized voice technology.

Key points

Changing How We Find Information Voice interfaces are shifting how users interact with the web, making it more accessible and dynamic.
AI and Web Integration By using the web's power, we can enhance "human hardware" and create seamless voice interfaces.

SpeechRecognition API

Dave demonstrated the use of the SpeechRecognition Web API to handle voice input and output.

Demo: See a SpeechSynthesis example.

Connecting AI with Voice Interfaces

Providing Context for AI

To make AI interactions effective, Dave emphasized:

Defining the role of the AI.
Giving the AI a personality to make interactions engaging.
Providing feedback to the user with visual and audio cues.

Visual Feedback

Idle: Subtle, inactive animation.
Listening: Faster, more dynamic movement.
Responding: Rapid and active visual feedback.

POC (Proof of Concept)

Using native synthesis tools, Dave built a rapid prototype to demonstrate how AI could power real-time speech systems.

Challenges with Latency

Dave experimented with ElevenLabs for realistic voice synthesis. While it produced impressive results, the latency was a significant drawback:

Latency Issues:
- Sending audio to the server.
- Server processing at ElevenLabs.
- Returning the output back to the client.
Impact: High latency disrupted the user experience, much like a slow home assistant.

Solutions to Improve Performance

Pre-fetching audio data to reduce wait times.
Optimizing processes to maintain the illusion of real-time interaction.

Dave’s Key Philosophy

AI Is a Tool, Not a Solution AI is just another data source—it’s the UX that creates the real impact.
Experimentation Is Crucial Build, don’t just discuss. Start creating with the latest tools and techniques to discover what’s possible.
Older Techniques Still Matter It took seven years for speech recognition to become truly useful. Embrace both old and new methods.

Resources

Discover more insights from Dave Bitter at AIVA.
Explore voice interface development further at TechHub.

Dave’s message is clear: The web allows for rapid prototyping and innovation. Let’s harness the power of AI to create amazing experiences!

We ♥ Web with Dave Bitter Senior Front-end Consultant