Resident University of Miami Miller School of Medicine University of Miami Miami, FL, US
Disclosure(s):
Adham M. Khalafallah, MD: No financial relationships to disclose
Introduction: Prior research on AI-based tools has demonstrated their potential to deliver comprehensible, specific, and satisfactory responses to medical questions for spine-related topics, generally showing good results in terms of accuracy and completeness. However, previous studies have not explored how AI performs across different languages, leaving a gap in understanding its effectiveness in non-English-speaking patient education. This study evaluates the performance of artificial intelligence (AI) in answering common patient questions about spine conditions across different languages.
Methods: ChatGPT 3.5 was presented with a standardized set of 20 questions related to four spine pathologies: compression fractures, spinal metastases, disc herniation, and sciatica. The questions were translated into Spanish and Arabic. Responses were assessed by three bilingual spine surgeons using a 5-point Likert scoring system on two criteria: accuracy and completeness.
Results: The analysis revealed significant differences in completeness for both Spanish and Arabic (p < 0.005), indicating less detailed responses in the non-English languages. No statistically significant differences were found for either Arabic or Spanish, suggesting that accuracy levels were comparable to English.
Conclusion : ChatGPT delivers comparable levels of accuracy when responding to common patient questions about spine conditions across English, Spanish, and Arabic. However, significant differences in completeness were observed, with responses in Spanish and Arabic being less comprehensive compared to English. This suggests that while the AI maintains a consistent level of accuracy across languages, it struggles to provide the same level of detailed information in non-English languages. These findings are crucial for healthcare providers considering AI-generated patient information packets, emphasizing the need for awareness of these limitations when serving non-English-speaking patients. These findings highlight the need for targeted AI pre-training to enhance the depth and comprehensiveness of AI-generated responses in non-English settings, ensuring equitable patient education across diverse populations.