Google’s AI Overview, a centerpiece of its revamped search experience, has demonstrated a surprising inability to correctly count letters within words. When asked how many 'P's are in "Google," the AI confidently responds with "two," an incorrect assessment.
This challenge extends to numerous other basic linguistic tasks. The AI Overview stated there is “exactly 1 ‘r’ in the word ‘poop’,” and claimed "journalism" contains two 'd's, even when spelling it out as "j-o-u-r-n-a-d-i-s-m." While it managed to identify a single 'P' in the U.S. president's last name, it then misspelled it as "t-r-p-u-m," highlighting a pervasive issue with fundamental spelling and letter recognition.
The current struggles of Google’s AI-driven search overhaul were, for many, an anticipated outcome. This isn't the first time Google has integrated AI Overviews into Search with problematic results. Previous iterations notoriously cited satirical articles from sources like The Onion and Reddit, leading to absurd and even dangerous advice, such as recommending eating rocks or adding glue to pizza.
Given Google’s intensified commitment to embedding generative AI at the core of its nearly three-decade-old flagship product, these recent stumbles, though concerning, are not entirely unforeseen.
This shift represents a significant transformation, as Google is fundamentally revamping its entire search engine around this AI-first approach.
Addressing the specific errors, Google acknowledged the challenge in a statement to TechCrunch, saying, “Counting within words has been a known challenge for LLMs, and we’re working to fix this particular issue.”
Such basic spelling inaccuracies might seem familiar to those acquainted with large language models (LLMs), the AI technology powering chatbots and text generators. Despite their capacity to code complex applications in seconds or solve long-standing mathematical problems, LLMs are fundamentally not designed to comprehend spelling in the human sense. It has been a long-running industry joke that a reliable test for any new AI model is to ask it how many 'r's are in "strawberry," as their spelling ability often mirrors that of a kindergartener.
Google’s AI Overview issues, however, transcend simple spelling mistakes. Last week, an error was patched where searching the word "disregard" would produce a dictionary-like definition that instead read, “Understood. Let me know whenever you have a new prompt or question!” Yet, the persistent and often amusing nature of these spelling errors remains due to their inherent difficulty to eradicate.
Researchers have previously elucidated the technical reasons behind these spelling conundrums: AI does not process sentences as units of language composed of words and letters. Many LLMs rely on transformer models, which break down text into "tokens." These tokens can represent anything from full words to syllables or individual letters, depending on the model. Instead of "reading" in a human-like manner, the AI converts text into numerical representations, which are then contextualized to generate a logical response.
Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, explained to TechCrunch: “LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding. When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’”
This token-based architecture, which underpins LLMs like Google’s AI Overview, inherently limits their ability to process text at a granular, letter-by-letter level. Consequently, researchers have expressed skepticism about the feasibility of fully resolving the spelling problem.
Sheridan Feucht, a PhD student at Northeastern University specializing in large language model interpretability, elaborated to TechCrunch: “It’s kind of hard to get around the question of what exactly a ‘word’ should be for a language model, and even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to ‘chunk’ things even further. My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”
While solving spelling isn't necessarily a top priority for researchers—given that the primary utility of LLMs lies elsewhere—these glaring failures serve as crucial reminders that AI is not infallible, despite its often seemingly omniscient capabilities. They underscore the critical importance of human verification and the necessity of not blindly trusting AI outputs without corroborating their accuracy.
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.
