Until now, even AI companies have had trouble finding tools that can reliably detect when a piece of writing has been created using a large language model. Now, a group of researchers has created a new method for assessing LLM usage in a large set of scholarly writing by measuring which “redundant words” began to appear much more frequently during the LLM era (ie 2023 and 2024). The results “suggest that at least 10% of 2024 abstracts were processed with LLM,” according to the researchers. In a preprint article posted earlier this month, four researchers from Germany’s University of Tubingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess deaths compared to the past. nearby. Taking a similar look at “word overuse” after LLM writing tools became widely available in late 2022, the researchers found that “the emergence of LLMs led to a sudden increase in the frequency of words style” that was “unprecedented in both quality and quantity”.
Getting deeper
To measure these vocabulary changes, the researchers analyzed 14 million abstracts of papers published in PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared each year. They then compared the expected frequency of these words (based on the pre-2023 trend line) with the actual frequency of those words in abstracts from 2023 and 2024, when LLMs were in widespread use.
The results found a number of words that were extremely uncommon in these scientific abstracts before 2023, which suddenly increased in popularity after LLMs were introduced. The word “excavation,” for example, appears in 25 times more 2024 papers than the pre-LLM trend expected; Words like “showcasing” and “highlighting” increased in use by nine times as well. Other previously common words became significantly more common in post-LLM abstracts: the frequency of “potential” increased by 4.1 percentage points; “findings” with 2.7 percentage points; and “decisive” by 2.6 percentage points, for example.
These kinds of changes in word usage can happen regardless of LLM usage, of course – the natural evolution of language means that words sometimes go in and out of style. However, the researchers found that, in the pre-LLM era, such massive and sudden year-on-year increases were only seen for words related to major world health events: “ebola” in 2015; “Zika” in 2017; and words like “coronavirus”, “lockdown” and “pandemic” in the period 2020-2022.
However, in the post-LLM period, researchers found hundreds of words with sudden and pronounced increases in scientific usage that had no common connection to world events. In fact, while redundant words during the COVID pandemic were mostly nouns, the researchers found that words with a frequency bump after LLM were extremely “style words” such as verbs, adjectives and adverbs (a small sample: “beyond, besides, comprehensive, decisive, improvements, exposed, knowledge, especially, particularly, within”).
This is not an entirely new discovery – for example, the increased prevalence of “digging” in scientific papers has been widely noted in the recent past. But previous studies generally relied on comparisons with “ground truth” human handwriting samples or lists of predefined LLM markers obtained from outside the study. Here, the set of pre-2023 abstracts acts as its effective control set to show how vocabulary choice has generally changed in the post-LLM era.
A complicated combination
Highlighting the hundreds of so-called “sign words” that became significantly more common in the post-LLM era, telltale signs of LLM usage can sometimes be easy to spot. Take this example of the abstract line called by the researchers, with the marker words in bold: “A HOLISTIC the capture of complicated complication between […] AND […] IS KEY for effective therapeutic strategies.”
After making some statistical measurements of the occurrence of the marker words across individual papers, the researchers estimate that at least 10 percent of the post-2022 papers in the PubMed corpus were written with at least some LLM assistance. The number could be even higher, the researchers say, because their group may be missing LLM-assisted abstracts that don’t include any of the tag words they identified.
Those measured percentages can vary greatly even in different subsets of cards. The researchers found that papers by authors in countries such as China, South Korea, and Taiwan featured LLM marker words 15 percent of the time, suggesting that “LLMs can … help non-natives with editing English texts, which can justify their widespread use”. On the other hand, researchers offer that native English speakers “can [just] Be better at noticing and actively removing unnatural style words from LLM results,” thereby hiding their LLM usage from this type of analysis.
Detecting LLM use is important, the researchers note, because “LLMs are notorious for creating references, providing inaccurate summaries, and making false claims that sound authoritative and persuasive.” But as knowledge of LLMs’ indicator words begins to spread, human editors can get better at extracting these words from generated text before it’s shared with the world.
Who knows, maybe the next big language models will do this kind of frequency analysis themselves, down-weighting the marker words to better mask their human-like results. Before long, we may need to summon some Blade Runners to pick out the AI-generating text hiding in our midst.