Flash floods rank among the world's deadliest weather phenomena, claiming over 5,000 lives annually. They are also notoriously challenging to predict. However, Google believes it has found an innovative solution to this complex problem: leveraging global news reports.
While extensive weather data exists, flash floods are characterized by their brief duration and localized nature, making comprehensive measurement difficult—unlike the continuous monitoring of temperature or river flows. This critical data gap has historically prevented advanced deep learning models, which are increasingly proficient in general weather forecasting, from accurately predicting these specific events.
To overcome this hurdle, Google researchers deployed Gemini, the company’s large language model, to analyze 5 million news articles worldwide. This process identified reports of 2.6 million distinct floods, transforming them into a geo-tagged time series called "Groundsource." Gila Loike, a Google Research product manager, confirmed this marks the first instance of the company utilizing language models for such an application. The groundbreaking research and its accompanying data set were made publicly available on Thursday morning.
Utilizing Groundsource as a real-world foundational dataset, the researchers then trained a model built upon a Long Short-Term Memory (LSTM) neural network. This model is designed to process global weather forecasts and subsequently calculate the probability of flash floods occurring in specific geographical areas.
Google's flash flood forecasting model is now actively identifying risks for urban centers across 150 countries, accessible via the company’s Flood Hub platform. Its data is also being shared with emergency response organizations globally. António José Beleza, an emergency response official at the Southern African Development Community, who participated in trialing the model, noted that it significantly improved his organization's speed in responding to flood events.
Despite its advancements, the model does possess certain limitations. Its resolution is currently somewhat low, indicating risk across areas of 20 square kilometers. Furthermore, it does not achieve the same level of precision as the US National Weather Service’s flood alert system, partly because Google's model does not integrate local radar data, which is crucial for real-time precipitation tracking.
Crucially, a core objective of this project was to develop a solution viable for regions where local governments may lack the financial resources to invest in expensive weather-sensing infrastructure or possess limited historical meteorological data.
Juliet Rothenberg, a program manager on Google’s Resilience team, explained to reporters this week, "Because we’re aggregating millions of reports, the Groundsource data set actually helps rebalance the map. It enables us to extrapolate to other regions where there isn’t as much information."
Rothenberg expressed the team's optimism that the methodology of using large language models to generate quantitative data from qualitative, written sources could be extended. This approach could potentially be applied to create datasets for other ephemeral yet critical phenomena requiring forecasts, such as heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech—a company employing similar deep learning models for river flow forecasting for clients like hydropower companies—commented that Google's contribution is part of a broader, intensifying effort to compile data for deep learning-based weather forecasting models. Moutenot also co-founded dynamical.org, an initiative dedicated to curating machine learning-ready weather data for researchers and startups.
"Data scarcity is one of the most difficult challenges in geophysics," Moutenot observed. He elaborated, "Simultaneously, there’s too much Earth data, and then when you want to evaluate against truth, there’s not enough. This was a really creative approach to get that data."
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.