Last week, Chinese AI startup DeepSeek rolled out an updated version of its reasoning model, R1-0528—and it’s already making waves for its strong performance in math and coding tasks. But the bigger buzz? Whispers in the AI world suggest DeepSeek may have trained the model using outputs from Google’s Gemini.
Although DeepSeek has stayed tight-lipped about its training data, independent AI researchers are raising eyebrows. One of them is Sam Paech, a Melbourne-based developer who designs emotional intelligence tests for AI. In a post on X (formerly Twitter), Paech shared what he calls “evidence” that R1-0528 closely mirrors Gemini 2.5 Pro in word choices and phrasing.
“It’s not conclusive proof,” Paech admits, “but the patterns are striking.”
He’s not the only one noticing similarities. A pseudonymous developer behind SpeechMap, a project that evaluates free speech in AI models, pointed out that the “thought traces” from DeepSeek’s model—basically its reasoning steps—feel eerily similar to those of Gemini.
A Pattern of Controversy?
This isn’t the first time DeepSeek has been in hot water over its training methods. Back in December, developers noticed that an earlier DeepSeek model, V3, would sometimes call itself ChatGPT—OpenAI’s flagship chatbot. That raised concerns that DeepSeek may have trained on logs from ChatGPT conversations.
Fast forward to earlier this year, and the accusations intensified. According to OpenAI, there’s direct evidence that DeepSeek used distillation—a method where smaller models are trained on the outputs of larger ones. Microsoft, a key OpenAI partner, even detected large-scale data scraping from OpenAI developer accounts in late 2024—accounts it now believes were linked to DeepSeek.
Distillation itself isn’t illegal, but it does break the rules. OpenAI’s terms of service specifically forbid using their models to train rival systems.
Why It’s Getting Harder to Tell Who Trained on What
Part of the problem is that today’s internet is flooded with AI-generated content. From spammy Reddit posts to clickbait blogs, even human-written content now contains AI fingerprints. That makes it increasingly difficult to isolate clean, original data for training.
As Nathan Lambert, a researcher at AI2 (Allen Institute for AI), put it:
“If I were DeepSeek, I’d absolutely create tons of synthetic data using the best model I could access. They’ve got cash but limited GPU access. It’s practically free compute for them.”
How AI Giants Are Fighting Back

To curb distillation, companies like OpenAI, Google, and Anthropic are tightening their defenses.
- In April, OpenAI started requiring ID verification from developers before they could access certain models—excluding countries like China.
- Google, meanwhile, has begun summarizing model traces inside its AI Studio, making it harder for rivals to reverse-engineer responses.
- Anthropic followed suit in May, citing the need to “protect competitive advantages.”
We’ve contacted Google for a statement and will update if they respond.
Final Thought
While there’s no concrete proof yet that DeepSeek copied Gemini, the signals are strong enough to keep the speculation alive. In an AI world racing toward ever-better models, where the lines between inspiration and imitation blur, the real challenge may be figuring out where innovation ends—and appropriation begins.
Also Read : Mediterranean Flatbread