Chief Resident Brown University Health Providence, RI, US
Introduction: While podcasts are an increasingly popular medium for medical education and research dissemination, creating high-quality research podcasts remains time-intensive and resource-demanding. Recent advancements in generative AI have enabled the creation of synthetic podcasts based on pre-selected content, including PDFs of academic papers. However, the accuracy of such AI-generated content remains understudied.
Methods: We utilized Google's NotebookLM's "audio overview" feature to generate podcast episodes based on Editor's Choice and cover articles from the JNS: Spine published between June-October 2024. Two clinicians independently listened to each podcast episode, evaluating factual claims defined as specific, verifiable statements about methodology, results, or conclusions. Discrepancies in factual verification were arbitrated by a third independent clinician in a blinded fashion.
Results: Five podcast episodes were generated, totaling 54.75 minutes of content. Individual episodes averaged 10.95 minutes (SD 1.79, range 8.5-13.2) and covered articles averaging 9.8 pages (SD 1.7, range 7-12). The podcasts contained 62 factual assertions (mean 12.4 per episode, SD 2.3). One episode initially contained a single hallucination (1.6% error rate) but was successfully regenerated without errors. Inter-observer reliability was 100% for hallucination identification.
Conclusion : AI-generated podcasts for JNS: Spine using Google NotebookLM demonstrate high factual reliability when paired with expert review. While the technology shows promise, maintaining clinical accuracy requires human oversight in the form of expert validation and potential regeneration of content. Future research should assess listener engagement and knowledge retention through controlled trials. Implementation could proceed through a pilot program on the JNS website, followed by broader distribution on platforms such as Apple Podcasts and Spotify, with careful monitoring of user feedback and engagement metrics.