Skip to main content
Mar 21

Gemini Automation: A Rough Gem

Despite a nine-minute dinner ordering process, the experience unequivocally evokes a strong sense of future technology. I’ve been evaluating Gemini’s

6 min read97 views3 tags
Originally reported bytheverge

Despite a nine-minute dinner ordering process, the experience unequivocally evokes a strong sense of future technology.

I’ve been evaluating Gemini’s novel task automation feature on both the Pixel 10 Pro and the Galaxy S26 Ultra. This marks a significant development as Gemini can now independently navigate and operate applications on behalf of the user. Currently in beta and restricted to a select range of food delivery and rideshare platforms, the system exhibits occasional slowness and clunkiness, and it doesn't address critical existing phone usage challenges. Nevertheless, its capabilities are remarkably impressive, and it's not an exaggeration to suggest this offers a tangible preview of future mobile interaction. While widespread adoption remains distant, this represents the inaugural instance of witnessing a genuine AI assistant actively functioning on a smartphone, beyond the confines of staged keynotes or meticulously managed convention hall demonstrations.

It's important to note that Gemini's operational speed is considerably slower than that of a human user. For immediate tasks, such as ordering a rideshare without delay, direct manual input remains the most efficient method. However, before dismissing its utility, consider that this task automation is primarily designed to function seamlessly in the background, freeing you to engage with other applications or activities on your device. Crucially, it continues to operate even when your attention is elsewhere, enabling concurrent actions like repeatedly verifying your passport's presence in your bag.

For those with an inquisitive nature, the entire automation process can be observed in real-time. As Gemini executes its tasks, descriptive text appears at the bottom of the screen, detailing its ongoing actions. An example from a recent dinner order on Saturday night was, “Selecting a second portion of Chicken Teriyaki for the combo.” Witnessing Gemini's ability to dynamically interpret and execute commands is genuinely impressive. When tasked with ordering a “chicken combo plate,” and presented with menu options in half-portion increments, it intelligently added two half-servings to fulfill the request.

Optimally, Gemini's automation initiates in the background by default. Users wishing to observe the task execution must actively tap a button to open a separate viewing window. This observation can, at times, be quite frustrating; for instance, watching the system laboriously search for a side of greens on an Uber Eats menu when it's prominently displayed at the top of the screen is akin to the suspense of a horror film where the audience knows the danger is imminent, albeit without the morbid outcome. During my teriyaki order, Gemini did make a few missteps, though it ultimately rectified them independently. The entire process, however, consumed approximately nine minutes, which is far from ideal.

Gemini is designed to execute tasks up to the final confirmation stage, whether it's for a rideshare or a meal delivery, allowing users to meticulously review its actions before finalizing. This approach, in my view, represents the most prudent method for utilizing the feature currently, and I find the slight additional step of completing the order to be a negligible inconvenience. Across five days of testing, Gemini has consistently refrained from autonomously finalizing any orders. Its accuracy is remarkably high, requiring minimal adjustments to the final selections. When failures do occur, which I observed a couple of times, they typically manifest within the initial minute or two, often stemming from app-specific requirements like location permissions or an incorrect default delivery address (e.g., defaulting to Nevada from a previous use). While diagnosing these issues required my intervention, once resolved, the automation could be seamlessly restarted.

The most striking demonstration of Gemini's capability involved scheduling a rideshare for a simulated flight. I had pre-populated my calendar with a fictitious flight to San Francisco for the subsequent day, using genuine flight specifics. My prompt to Gemini was intentionally broad: “schedule an Uber that would get me to the airport in time for my flight tomorrow.” Leveraging its access to my email and calendar, Gemini was able to retrieve the necessary flight details. While it required a minor additional prompt—potentially because the flight information wasn't in my email as anticipated—it successfully located the relevant data. It then proposed departure times of 11:30 AM or 11:45 AM, which were logical for a 1:45 PM flight given my proximity to the airport, and sought confirmation for one of these slots. Upon my confirmation, Gemini proceeded to arrange the ride within approximately three minutes, requiring no further intervention from me.

The sophistication of this feature becomes even more apparent when considering that Uber's terminology is to “reserve” a ride, not “schedule” one. This distinction underscores the fundamental divergence between traditional digital assistants we've grown accustomed to and the advanced AI assistants now emerging. The ability to interact with a computer using natural, conversational language profoundly enhances user experience, whether managing a smart home or placing a dinner order. Conversely, if an AI frequently falters and demands clarification over minor semantic differences—such as mistaking “plate” for “combo” on a restaurant menu, or “slaw” for “shredded cabbage”—its utility diminishes to that of the basic assistants we've used for a decade for tasks like setting timers or playing music.

Despite these advancements, observing Gemini navigate and interact with an app like Uber Eats starkly highlights a crucial point: an application designed primarily for AI interaction would bear little resemblance to the human-centric interfaces prevalent today. An AI assistant, for instance, would not be swayed by a prominent advertisement offering a 30 percent discount, nor would an aesthetically pleasing, professionally staged photograph of a dish hold any more persuasive power than a low-quality image. Instead of visual clutter, an AI would ideally operate on a structured database, a concept the industry is actively pursuing through initiatives like the Model Context Protocol (MCP).

The current method of an AI model attempting to interpret and navigate a human-designed interface for tasks like ordering a pizza seems both inefficient and prone to fragility. It occasionally encounters obstacles and is not adept at providing clear explanations for its failures. This iteration of task automation appears to be an interim solution, pending the adoption of more robust integration methods by app developers, such as the Model Context Protocol (MCP) or Android's native app functions. Sameer Samat, Google's Head of Android, recently confirmed that Gemini employs this reasoning-based approach precisely because these alternative, more direct methods are not yet widely implemented. This current form of task automation may serve either as an early glimpse into future capabilities or as an incentive for developers to embrace more advanced integration. Regardless, it stands as a significant initial stride toward a fundamentally new paradigm for interacting with our mobile assistants—a journey that, while presently somewhat cumbersome and deliberate, holds immense promise.

ES
Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts
Reader feedback

What did you think of this story?

User Comments

Filter:
No comments yet. Be the first to comment!
Continue reading
View all news