Introduction: The advent of large language models (LLMs) represents a paradigm shift in academic writing, including within the domain of spine surgery research. This study seeks to evaluate the prevalence of LLM-generated content in abstracts presented at three international conferences that feature spine surgery research: the American Association of Neurological Surgeons (AANS), Scoliosis Research Society (SRS), and EUROSPINE.
Methods: A total of 1,514 abstracts were analyzed: 421 from 2021, 507 from 2022, and 586 from 2023. Abstracts were retrieved in PDF format from the respective conference proceedings, and optical character recognition was used to convert them into machine-readable text. We utilized a zero-shot, domain-agnostic LLM text detector to identify abstracts that contained significant segments of LLM-generated content. The proportion of flagged abstracts was calculated for each year, and chi-squared tests were applied to determine the statistical significance of the year-on-year increases (p < 0.05).
Results: In 2021, 3 of 421 abstracts (0.71%) were flagged for containing significant LLM-generated content, increasing to 5 of 507 abstracts (0.99%) in 2022. By 2023, this number surged to 42 of 586 abstracts (7.17%). The increase in LLM-generated content between 2021 and 2023 was statistically significant (p < 0.001), as was the rise from 2022 to 2023 (p < 0.001). AANS abstracts had 18 flagged reports, whereas SRS and EUROSPINE had 10 and 14, respectively, with no statistically significant differences in observation between the three conferences.
Conclusion : The accelerated use of LLMs in scientific abstracts, coinciding with ChatGPT's release, signals a foundational shift in academic writing that is likely to expand. Establishing guidelines for transparent LLM use is essential to maintain scientific integrity as these tools become increasingly embedded in academic writing.