Skip to main content
Sep 13

DeepSeek’s New AI Model May Have Used Google’s Gemini Data

Last week, the Chinese AI lab DeepSeek unveiled an updated version of its R1 reasoning model, the R1-0528, which performs impressively on math and coding tasks. However, the company has not disclosed the data used to tra...

1 min read345 views1 tags
DeepSeek’s New AI Model May Have Used Google’s Gemini Data
Originally reported bytechcrunch
Last week, the Chinese AI lab DeepSeek unveiled an updated version of its R1 reasoning model, the R1-0528, which performs impressively on math and coding tasks. However, the company has not disclosed the data used to train the model, leading some AI researchers to speculate that DeepSeek may have used data from Google’s Gemini AI family.  Melbourne-based developer Sam Paech published a post suggesting that the language patterns in R1-0528 resemble those used by Google’s Gemini 2.5 Pro model. Other developers also noted that the model's "thought processes" seem to align with traces typically generated by Gemini models. This isn't the first time DeepSeek has faced accusations of using data from rival AI models. In December, its V3 model had been found to often identify itself as ChatGPT, suggesting it may have been trained on OpenAI’s chatbot logs. Earlier this year, OpenAI claimed to have discovered evidence of DeepSeek using the distillation method, which involves extracting data from more advanced AI models. Distillation, while not uncommon, is prohibited by OpenAI's terms of service if it involves training competing models using OpenAI's outputs. While AI models often converge on similar language and phrases due to the prevalence of AI-generated content on the web, experts, including Nathan Lambert from AI2, suggest that it's plausible DeepSeek could have used data from Google’s Gemini models. Distillation is particularly attractive to companies like DeepSeek, which may have limited resources for large-scale AI training but ample funding. To counter such practices, companies like Google and OpenAI have begun implementing security measures, such as requiring ID verification for access to certain models and summarizing the traces generated by their models to prevent data extraction. Google has not yet commented on the matter, but the ongoing debate highlights growing concerns around AI data usage and security.
#news
ES
Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts
Reader feedback

What did you think of this story?

User Comments

Filter:
No comments yet. Be the first to comment!
Continue reading
View all news