Gemini Live first look: Better than chatting to Siri, but less than I'd like

August 14, 2024
Brian

Google debuted Gemini Live on Tuesday at its Made by Google event. The feature allows you to conduct a semi-natural spoken conversation with an AI chatbot powered by Google's latest big language model, rather than typing it out. TechCrunch was there to try it out for themselves.

Gemini Live is Google's reaction to OpenAI's Advanced Voice Mode, as well as ChatGPT's almost equivalent feature, which is now under limited alpha testing. While OpenAI demoed the feature first, Google was the first to release the finished version.

In my experience, these low latency vocal functions feel far more natural than texting with ChatGPT or speaking with Siri or Alexa. I noticed that Gemini Live answered to questions in less than two seconds and could pivot fast when interrupted. Gemini Live is not flawless, but it is the finest way to use your phone hands-free that I've encountered so far.

How Gemini Live Works

Before conversing with Gemini Live, you can choose from ten different voices, as opposed to only three from OpenAI. Google collaborated with voice performers to produce each one. I enjoyed the variety, and each one sounded extremely human.

In one case, a Google product manager asked Gemini Live to discover family-friendly wineries around Mountain View that had outdoor areas and playgrounds nearby, so that children might potentially join. That's a lot more sophisticated goal than I'd ask Siri — or Google Search, for that matter — yet Gemini successfully selected a location that fit the requirements: Cooper-Garrod Vineyards in Saratoga.

However, Gemini Live leaves something to be desired. It appeared to hallucinate a neighboring playground, Henry Elementary School Playground, which is purportedly "10 minutes away" from the vineyard. There are additional playgrounds in Saratoga, but the nearest Henry Elementary School is more than two hours away. There is a Henry Ford Elementary School in Redwood City, however it is 30 minutes distant.

Google liked to demonstrate how people could interrupt Gemini Live in mid-sentence and the AI would instantly pivot. The company claims that this allows users to dominate the conversation. In actuality, this feature does not function perfectly. Sometimes Google's project managers and Gemini Live were talking over each other, and the AI didn't seem to understand what was being said.

According to product manager Leland Rechis, Google does not allow Gemini Live to sing or impersonate any voices other than the ten it supplies. The corporation is most likely doing this to avoid problems with copyright laws. Furthermore, Rechis stated that Google is not focusing on getting Gemini Live to understand emotional intonation in a user's speech, as OpenAI touted during its demo.

Overall, the function appears to be a terrific method to delve deeply into a topic more naturally than using a simple Google Search. Google describes Gemini Live as a step toward Project Astra, a fully multimodal AI model that the firm unveiled at Google I/O.