Did DeepSeek Train Its AI Using Google’s Gemini? Experts Think So

Last week, Chinese AI startup DeepSeek rolled out an updated version of its reasoning model, R1-0528—and it’s already making waves for its strong performance in math and coding tasks. But the bigger buzz? Whispers in the AI world suggest DeepSeek may have trained the model using outputs from Google’s Gemini.

Contents

A Pattern of Controversy?
Why It’s Getting Harder to Tell Who Trained on What
How AI Giants Are Fighting Back
Final Thought

Although DeepSeek has stayed tight-lipped about its training data, independent AI researchers are raising eyebrows. One of them is Sam Paech, a Melbourne-based developer who designs emotional intelligence tests for AI. In a post on X (formerly Twitter), Paech shared what he calls “evidence” that R1-0528 closely mirrors Gemini 2.5 Pro in word choices and phrasing.

“It’s not conclusive proof,” Paech admits, “but the patterns are striking.”

He’s not the only one noticing similarities. A pseudonymous developer behind SpeechMap, a project that evaluates free speech in AI models, pointed out that the “thought traces” from DeepSeek’s model—basically its reasoning steps—feel eerily similar to those of Gemini.

A Pattern of Controversy?

This isn’t the first time DeepSeek has been in hot water over its training methods. Back in December, developers noticed that an earlier DeepSeek model, V3, would sometimes call itself ChatGPT—OpenAI’s flagship chatbot. That raised concerns that DeepSeek may have trained on logs from ChatGPT conversations.

Fast forward to earlier this year, and the accusations intensified. According to OpenAI, there’s direct evidence that DeepSeek used distillation—a method where smaller models are trained on the outputs of larger ones. Microsoft, a key OpenAI partner, even detected large-scale data scraping from OpenAI developer accounts in late 2024—accounts it now believes were linked to DeepSeek.

Distillation itself isn’t illegal, but it does break the rules. OpenAI’s terms of service specifically forbid using their models to train rival systems.

Why It’s Getting Harder to Tell Who Trained on What

Part of the problem is that today’s internet is flooded with AI-generated content. From spammy Reddit posts to clickbait blogs, even human-written content now contains AI fingerprints. That makes it increasingly difficult to isolate clean, original data for training.

As Nathan Lambert, a researcher at AI2 (Allen Institute for AI), put it:

“If I were DeepSeek, I’d absolutely create tons of synthetic data using the best model I could access. They’ve got cash but limited GPU access. It’s practically free compute for them.”

How AI Giants Are Fighting Back

To curb distillation, companies like OpenAI, Google, and Anthropic are tightening their defenses.

In April, OpenAI started requiring ID verification from developers before they could access certain models—excluding countries like China.
Google, meanwhile, has begun summarizing model traces inside its AI Studio, making it harder for rivals to reverse-engineer responses.
Anthropic followed suit in May, citing the need to “protect competitive advantages.”

We’ve contacted Google for a statement and will update if they respond.

Final Thought

While there’s no concrete proof yet that DeepSeek copied Gemini, the signals are strong enough to keep the speculation alive. In an AI world racing toward ever-better models, where the lines between inspiration and imitation blur, the real challenge may be figuring out where innovation ends—and appropriation begins.

SpaceX Wants to Send Humans to Mars by 2028 Here’s Why That’s Not Likely

Meta Expands Instagram’s Safety Tools for Young Users

Scale AI Lays Off 200 Employees Amid Major Meta Investment

GM and Redwood Materials Team Up to Repurpose EV Batteries for Powering Data Centers

US Army Soldier Pleads Guilty to Hacking Telecom Companies and Extortion

Did DeepSeek Train Its AI Using Google’s Gemini? Experts Think So

SpaceX Wants to Send Humans to Mars by 2028 Here’s Why That’s Not Likely

Meta Expands Instagram’s Safety Tools for Young Users

Scale AI Lays Off 200 Employees Amid Major Meta Investment

Did DeepSeek Train Its AI Using Google’s Gemini? Experts Think So

A Pattern of Controversy?

Why It’s Getting Harder to Tell Who Trained on What

How AI Giants Are Fighting Back

Final Thought

Related Posts