Sep 13

DeepSeek’s New AI Model May Have Used Google’s Gemini Data

Published September 13, 20251 min read476 views1 tags

Originally reported bytechcrunch

Last week, the Chinese AI lab DeepSeek unveiled an updated version of its R1 reasoning model, the R1-0528, which performs impressively on math and coding tasks. However, the company has not disclosed the data used to train the model, leading some AI researchers to speculate that DeepSeek may have used data from Google’s Gemini AI family. Melbourne-based developer Sam Paech published a post suggesting that the language patterns in R1-0528 resemble those used by Google’s Gemini 2.5 Pro model. Other developers also noted that the model's "thought processes" seem to align with traces typically generated by Gemini models. This isn't the first time DeepSeek has faced accusations of using data from rival AI models. In December, its V3 model had been found to often identify itself as ChatGPT, suggesting it may have been trained on OpenAI’s chatbot logs. Earlier this year, OpenAI claimed to have discovered evidence of DeepSeek using the distillation method, which involves extracting data from more advanced AI models. Distillation, while not uncommon, is prohibited by OpenAI's terms of service if it involves training competing models using OpenAI's outputs. While AI models often converge on similar language and phrases due to the prevalence of AI-generated content on the web, experts, including Nathan Lambert from AI2, suggest that it's plausible DeepSeek could have used data from Google’s Gemini models. Distillation is particularly attractive to companies like DeepSeek, which may have limited resources for large-scale AI training but ample funding. To counter such practices, companies like Google and OpenAI have begun implementing security measures, such as requiring ID verification for access to certain models and summarizing the traces generated by their models to prevent data extraction. Google has not yet commented on the matter, but the ongoing debate highlights growing concerns around AI data usage and security.

#news

Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts

Reader feedback

What did you think of this story?

User Comments

Filter:

No comments yet. Be the first to comment!

View all news

Can Money Bridge the Artist-AI Divide?

#ainews#pippa#artists#revenuesharing#aiethics

Pippa is actively seeking to engage artists through an innovative revenue-sharing model. For years, illustrators have vocally expressed concerns regarding generative artificial intelligence startups t...

6 min readAugust 2, 2026

6h ago

Judge Rejects xAI's Challenge to Minnesota 'Nudify' App Ban

#ainews#xai#minnesota#nudifyappban#legalchallenge

A Minnesota prohibition on applications enabling users to "nudify" images is cleared to advance, irrespective of a legal challenge from xAI, as reported by NBC News. U.S. District Judge Donovan Frank'...

2 min readAugust 2, 2026

22h ago

YouTuber Hank Green Calls His AI Use 'Unhealthy

#ainews#hankgreen#chatgpt#unhealthyuse#authenticity

Hank Green, the accomplished novelist, comedian, and YouTuber boasting 3.2 million subscribers, recently issued an apology to his extensive audience regarding his increasing reliance on AI chatbots. T...

3 min readAugust 2, 2026

23h ago