Abstract: FR-PO030
Comparison of a Deep Learning Model with Human Expert Annotations for Segmentation of Kidneys, Tumors, and Cysts in Routine CT Imaging Exams
Session Information
- AI, Digital Health, Data Science - II
November 03, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Kline, Timothy L., Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Cook, Cole J., Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Gregory, Adriana, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Klug, Jason R., Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Potretzke, Theodora A., Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Ron, Eyal, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Takahashi, Naoki, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Erickson, Bradley J., Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Khanna, Abhinav, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Sharma, Vidit, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
- Leibovich, Bradley, Mayo Foundation for Medical Education and Research, Rochester, Minnesota, United States
Background
This study explores if a deep learning model for automatic kidney, tumor, and cyst segmentation from abdominal CT images can match interrater agreement.
Methods
A deep learning model was developed to segment the kidneys, tumors, and cysts from abdominal CTs. The training/validation set consisted of 1003 images from 479 unique subjects. A urologic oncologist with expertise in renal tumor evaluation and treatment (reference standard) and two radiology residents with experience in general abdominal CT imaging manually segmented 30 images, a held-out test set subset. Segmentation overlap between the reference standard and residents, or AI segmentations was assessed via the Dice coefficient. Confidence intervals (CI) for the probability the Dice coefficient between the reference standard and AI segmentation was larger than the with the residents based on a Mann Whitney U equivalence test were generated for left kidney, right kidney, tumor, and cyst.
Results
The mean and standard deviation of the Dice coefficients between reference standard and residents (AI) were 0.91±0.03 (0.93±0.03) for left kidney, 0.92±0.02 (0.94±0.02) for right kidney, 0.80±0.23 (0.86±0.18) for tumor mask, and 0.24±0.35 (0.42±0.37) for cyst mask, 0.81±0.24 (0.83±0.23) (see Figure 1). The 90% CIs tended to be greater than 0.5 in all cases but the tumor masks (see Table), suggesting the AI is often performing within expected interrater agreement.
Conclusion
A fully automated kidney, tumor, and cyst segmentation algorithm was trained and evaluated against three independent readers. The AI algorithm was found to compare similarly to interrater agreement.