Journal editors and peer reviewers are facing an unprecedented challenge, inundated with AI-generated academic papers that are increasingly difficult to distinguish from legitimate research.
Last summer, Peter Degen’s postdoctoral supervisor brought him an unusual observation: one of his papers was receiving an unexpectedly high number of citations. While academic citations are highly valued, the sheer volume and rapid pace were suspicious. Published in 2017, the paper, which assessed the accuracy of a specific statistical analysis on epidemiological data, had accumulated a modest few dozen citations over the years. However, it was now being referenced hundreds of times, almost daily, propelling it among the most cited works of his career. Rather than celebrating, Degen’s adviser requested an investigation.
Degen, a postdoctoral researcher at the University of Zurich Center for Reproducible Science and Research Synthesis, discovered a consistent pattern among the citing papers. Similar to the original, they all analyzed the Global Burden of Disease study, a publicly accessible dataset managed by the Institute for Health Metrics and Evaluation at the University of Washington. Yet, these new papers were exploiting the dataset to produce a seemingly infinite stream of predictions: from the future likelihood of stroke in adults over 20, to testicular cancer in young adults, falls among the elderly in China, and colorectal cancer in individuals with minimal whole grain consumption, illustrating a generic "disease X among population Y" formula.
His search for code used in such analyses on GitHub led Degen to the Chinese social media platform Bilibili. There, he uncovered a Guangzhou-based company promoting tutorials on how to generate publishable research in under two hours using its proprietary software and AI writing tools. While these studies were often subpar, with researchers analyzing a subset on headaches finding them riddled with errors and misrepresentations, they were no longer as glaringly flawed as earlier AI-generated papers, making them harder to filter out.
“It’s a huge burden on the peer-review system, which is already at the limit,” Degen stated. “There’s just too many papers being published and there’s not enough peer reviewers, and if the LLMs make it so much easier to mass produce papers, then this will reach a breaking point.”
Despite optimists holding high hopes for generative AI’s potential to catalyze future scientific breakthroughs—promising accelerated discovery and the eradication of diseases—the technology is currently undermining a fundamental pillar of scientific research. It is swamping editors and reviewers with an incessant flow of papers, creating a paradox: the better the AI becomes at producing competent papers, the more severe the crisis.
For the past decade, academic publishing has been battling "paper mills"—black-market entities that mass-produce research papers and sell authorship slots to academics, doctors, or others seeking a competitive edge through published work. This struggle has been a continuous game of cat and mouse, with publishers, often prompted by "science sleuths" specializing in identifying fraudulent research, patching one vulnerability only for the mills to exploit another. Generative AI initially provided a boost to these mills, helping them bypass plagiarism detectors by creating entirely new images and text. However, the technology’s characteristic "hallucinations" meant publishers could, in theory, screen out much of their output. In practice, papers still slipped through, only to be retracted later when sleuths encountered absurdities like a diagram of a rat with inexplicably gargantuan genitals labeled “testtomcels” or prose containing “as an AI assistant” phrases that had been overlooked.
Now, however, AI has advanced to a point where it can generate convincing papers almost entirely, empowering desperate academics to "mill" their own publications. The consequence is a flood of scientific "slop" that threatens to overwhelm the entire ecosystem of publishing, peer review, grant allocation, and the foundational research system as it currently exists.
Matt Spick, a lecturer in health and biomedical data analytics at the University of Surrey and an associate editor at *Scientific Reports*, first observed this trend when he received three remarkably similar papers analyzing the US National Health and Nutrition Examination Survey (NHANES), another public dataset. A quick check on Google Scholar confirmed his suspicion: there had been a sudden explosion in papers citing NHANES, all following a similar template, each claiming to discover an association between, for instance, eating walnuts and cognitive function or consuming skim milk and depression.
“If you’ve got enough computing power, you go through and you measure every single pairwise association, and eventually you find some that haven’t been written on before and you just publish: There is a correlation between this and that,” Spick explained. Such correlations often represent misleading oversimplifications of multi-causal phenomena or mere statistical flukes. He cited an example: “One was that how many years you spend in education will cause postoperative hernia complications. That is just a random correlation. What am I supposed to do with that? Leave school early so that I won’t get a postoperative hernia complication later?”
Over the years, "sleuths" have developed various techniques to detect inauthentic papers. Some look for "tortured phrases," instances where an existing paper was run through a synonym generator to evade plagiarism detectors, often transforming technical terms like “reinforcement learning” into nonsensical equivalents. Other methods include tracking duplicated images, performing network analysis of authors, or checking citations for hallucinated publications—a classic indicator of Large Language Model (LLM) use. Spick’s approach involves identifying large quantities of papers that follow identical templates while analyzing public datasets.
One such "tortured phrase" example was “Reinforcement getting to know.”
These papers, while often misleading, are not necessarily factually incorrect, nor are they strictly fraudulent. They are simply useless and, crucially, now very easy to produce. Last year, several journals began restricting submissions of papers that analyze public datasets, citing an overwhelming surge of redundant research.
Spick fears these measures may be insufficient, akin to "fighting the last battle." In recent months, AI companies have unveiled a new generation of “agentic” science assistants capable of analyzing data, generating hypotheses, and writing research papers with a high degree of autonomy. While these systems could be a step toward AI-accelerated science, they also introduce novel risks. When researchers at Carnegie Mellon tested several agentic tools, they found that these systems sometimes invented data or employed misleading techniques. These errors, however, were only discernible upon meticulous analysis of the entire workflow; the final papers appeared polished and credible.
Earlier this year, when announcing an AI paper writing assistant, OpenAI’s then-vice president for science, Kevin Weil, predicted, “I think 2026 will be for AI and science what 2025 was for AI and software engineering.” Curious about its capabilities, Spick and his colleagues provided the tool, named Prism, with data from an already published paper documenting eggplant and pepper ripening times. Prism analyzed the data, proposed a new statistical method applicable to it, and generated an entire paper, complete with charts and accurate citations.
“We were all looking at each other like, ‘What the [expletive], this is actually a decent piece of work!’” Spick recounted. Unlike the template-driven papers he had previously encountered, this one did not follow a pre-defined structure nor did it rely on a single, well-known database. It produced the full paper in just 25 minutes and 50 seconds.
“I’m genuinely not sure at what point we will suddenly realize that more are getting through than we realize because we can’t easily tell the difference anymore,” Spick admitted.
This development raises profound philosophical questions, Spick noted, such as whether the authorship—human or AI—matters if the information presented is accurate, and if science should be in the business of publishing every conceivable fact.
“Part of science is supposed to be the filter. We’re supposed to publish the stuff that we think is interesting, not publish literally everything that we can possibly find,” Spick emphasized. “Because if we do that, science is just spamming the world with all the data, irrespective of whether it constitutes actual new knowledge or not, and in any kind of medium-term time frame, it’s almost impossible to work out what’s meaningful and what isn’t.”
This represents the immediate practical challenge posed by AI agents: they threaten to overwhelm the human systems responsible for creating and organizing knowledge. Research funders are now contending with barrages of proposals meticulously tailored to their specific grants, struggling to discern which projects represent years of dedicated work and which were generated in mere minutes. Conference organizers, journal editors, and peer reviewers alike are grappling with an influx of material that, at first glance, appears sufficiently credible to warrant a thorough review. There is a growing and enormous asymmetry between the time required to produce new work and the time a subject-matter expert needs to vet it.
For Marit Moe-Pryce, the managing editor of the international relations journal *Security Dialogue*, submissions have surged by 100 percent compared to the previous year. Equally problematic, the quality of all submissions has become remarkably high. The days of blatant hallucinations and visible AI prompts are gone; everything is now coherent, well-structured, and stylistically consistent, making it difficult to determine if a paper is entirely AI-generated, the work of an experienced academic, or a young scholar leveraging AI as an editing tool.
“The main problem that we see currently from the desk is that the fraudulent side and the academic side are conflating, which ends up with a big gray mass of articles that we as editors need to sit and try to figure out, ‘What is this? Is this something that we need to engage with? Is it not?’” Moe-Pryce explained.
One particular paper managed to bypass at least 10 editors and two rounds of peer review before Moe-Pryce detected a fabricated citation—a highly plausible one, involving several former editors of the journal on a topic they might have written about but never did. She subsequently uncovered several more. While she remains unsure at what stage of revision these hallucinations were introduced, the near-miss underscored the meticulous attention now required to prevent false information from being published. With models increasingly citing real papers, she now has to scrutinize whether the cited works are those an expert would genuinely reference, as AI has yet to master the distinction between canonical literature and more peripheral contributions.
“It’s incredibly detailed, and this is a normal part of the editorial work. The difference is that now you have to do that for all the rubbish that comes through the door,” Moe-Pryce lamented. “That’s why our workload becomes so unmanageable.”
“AI currently holds the potential to bring down the publishing system as we know it.”
Academic papers typically undergo a multi-stage review process before publication. Initially, manuscripts are triaged for obvious issues, then forwarded to a journal’s editor, who assesses their potential for publication. The editor then sends it to an associate editor, an expert in the field, who further vets the paper before enlisting two or three subject-matter specialists—the "peers" in peer review—to read the manuscript and provide feedback. Editors and reviewers usually volunteer their time, working without pay in addition to their primary academic responsibilities.
The existing review system was already struggling under a rising tide of submissions. Now, AI is not only increasing these volumes but also making substandard papers significantly harder to filter out. Moe-Pryce now dedicates more time to pre-screening papers before deciding which to send for review. Concurrently, prospective reviewers, themselves overwhelmed, are increasingly less likely to respond. Where she once could send out four queries and receive three replies, it now takes her a dozen attempts to secure two reviewers. Increasingly, she reaches out to 20 reviewers and hears nothing back.
“It’s fatigue. Academic journals have mushroomed, and then you have AI helping everyone fraudulent or not generate more, faster, so you have a massive increase in volume,” she explained. “AI currently holds the potential to bring down the publishing system as we know it.”
The journal *Accountability in Research* has experienced a 60 percent surge in submissions this year, according to David Resnik, an associate editor. Ironically, he has been besieged by what appear to be AI-generated papers discussing fraudulent academic papers, likely having mined public data compiled by the organization Retraction Watch.
Resnik, too, is struggling to find reviewers. At times, he has had to dispatch 20 requests merely to elicit two responses—and he suspects that some of the responses he has received might themselves be AI-generated. His suspicion is not unfounded: a survey conducted last year by the publishing company Frontiers revealed that over half of researchers have utilized AI assistance in their peer review activities.
“I’m very worried about this straining, breaking the back of the peer-review system,” Resnik stated.
The arrival of AI agents coincides with a period when academia’s quality filters are already strained by an overabundance of papers. The number of scientific papers published has grown exponentially in recent years, according to an analysis of data published in *Quantitative Science Studies*, while the number of PhDs available to review them has not kept pace. Unfortunately, the authors attribute this explosion in productivity not to rapid scientific progress but to commercial and professional incentives that align to maximize paper quantity.
Many journals have transitioned to an "open access" model, where they generate revenue by charging authors processing fees for publication, rather than relying on subscriptions. In earnings calls, publishing companies frequently highlight recent submission increases of 20 percent or more as a positive growth indicator. Meanwhile, universities and funding agencies evaluate researchers’ publication metrics when making funding or promotion decisions, placing researchers under immense pressure to "publish or perish." This pressure isn't limited to traditional academics; international medical students can enhance their chances for a US residency program by having several peer-reviewed papers on their CV. In China, medical doctors face strong incentives to publish despite often lacking the time or resources to conduct original research, making rapid paper generation an appealing option.
Introducing an infinite paper-writing machine into a system that defines productivity by the volume of papers produced inevitably leads to its extensive use for generating a multitude of publications. A study published in *Nature* this year...
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.