Medical Student Northwestern University Chicago, IL, US
Introduction: Accurate vertebral segmentation is an important step in diagnosis and treatment of spinal metastases. Segmenting these metastases is especially challenging given their radiographic heterogeneity. Conventional approaches to segmenting vertebrae have included manual review or deep learning. However, manual review is time-intensive with interrater reliability issues, while deep learning requires large datasets to build. The rise of generative AI, notably Meta’s “Segment Anything Model 2” (SAM2), promises the ability to rapidly generate segmentations of any image without any pretraining.
The goal of this study was to assess the ability of SAM2 to segment vertebrae with metastases.
Methods: We used a publicly available set of spinal CT scans from The Cancer Imaging Archive, including patient sex, BMI, vertebral locations, types of metastatic lesion (lytic, blastic, or mixed), and primary cancer type. We also extracted ground-truth segmentations for each vertebra derived by neuroradiologists.
SAM2 produced segmentations for each vertebral slice without any training data, which were compared to gold standard segmentations using the Dice score. We also assessed relative performance differences across clinical subgroups using standard statistical techniques.
Results: We extracted imaging data for 55 patients and 782 unique thoracolumbar vertebrae, 153 of which had metastatic tumor involvement (59 blastic, 46 lytic, 58 mixed). Across these vertebrae, SAM2 had a mean volumetric Dice score of 0.840 (0.097). There was no significant difference in SAM2 performance across sex (p = 0.46) or BMI (p = 0.27). SAM2 performed significantly worse on thoracic vertebrae relative to lumbar vertebrae (0.816 versus 0.874, p< 0.001. The model performed worst on mixed (0.783 [0.045]) and lytic lesions (0.820 [0.011]) relative to vertebrae with blastic lesions (0.885 [0.045]) or no metastatic disease (0.842 [0.022]) (p < 0.001). Performance was lowest for urothelial (0.612 [0.026]), lung (0.738 [0.118]), and skin (0.738 [0.107]) lesions, while segmentations for soft tissue sarcoma (0.891 [0.035]), uterine (0.906 [0.027]), and cervical (0.904 [0.027]) were the best (p < 0.001).
Conclusion : Our results demonstrate that general-purpose segmentation models like SAM2 can provide reasonable vertebral segmentation accuracy out-of-the-box, with efficacy comparable to previously published trained models. Future research should include optimizations of spine segmentation models for location and type of lesion.