Abstract: FR-PO022
Accuracy of Bayesian Improved First Name Surname Geocoding (BIFSG) for Race and Ethnicity Imputation in a Kidney Care Management Program to Assess Racial Disparities
Session Information
- AI, Digital Health, Data Science - II
November 03, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Bruce, Liana DesHarnais, Somatus, McLean, Virginia, United States
- Krasniak, Christopher S., Somatus, McLean, Virginia, United States
- Eddings, Cliff S., Somatus, McLean, Virginia, United States
- Phan, Brandon, Somatus, McLean, Virginia, United States
- Mikhael, Bassem, Somatus, McLean, Virginia, United States
- Kimura, Joe, Somatus, McLean, Virginia, United States
Background
Self-reported race and ethnicity data are ideal for classifying race and ethnicity to improve equity and close health outcome disparities, but these data have low response rates, typically <20%. Our goal was to validate race and ethnicity imputed using the BIFSG algorithm against self-reported data in a kidney care management population.
Methods
Patients are assessed at baseline and responses classified into six Office of Management and Budget standardized combined categories. We applied RAND’s indirect estimation method to generate estimates based on first names, surnames, and ZIP Codes. Accuracy, specificity, sensitivity, and positive predictive value (PPV) were then calculated to compare BIFSG-imputed values with self-reported values in a validation subsample.
Results
53,695 (16%) of 326,679 patients self-reported race/ethnicity. BIFSG predicted 269,354 (82%) of the overall population, including 44,964 of the self-report cohort. After imputation, 278,085 (85%) of patients had non-missing race/ethnicity. Overall imputed value accuracy compared to self-report was 99%. PPV was highest for Hispanic and lowest for American Indian or Alaskan Native, while accuracy was highest for Native and lowest for White.
Conclusion
Imputation of race/ethnicity can improve analyses of health disparities in kidney disease. The BIFSG imputation model obtained highly accurate (99%) predictions of race and ethnicity in a large chronic kidney disease population, increasing coverage of racial identity from 16% to 85%. The BIFSG algorithm could be supplemented with additional sources eg, historical records to impute residual missing values. Incorporating additional data and advanced machine learning models will improve predictions to better track health disparities.
Metric | Calculation | Overall | White | Black or African American | Hispanic | Asian or Pacific Islander | American Indian or Alaskan Native |
Precision (PPV) | TP/(TP+FP) | 80.53% | 75.10% | 89.14% | 92.23% | 69.20% | 51.35% |
Recall (Sensitivity) | TP/(TP+FN) | 80.48% | 93.96% | 64.66% | 82.79% | 60.49% | 6.55% |
Specificity | TN/(TN+FP) | 99.31% | 88.71% | 99.45% | 99.90% | 99.98% | 99.99% |
Accuracy | (TN+TP)/(TN+TP+FP+FN) | 98.67% | 90.11% | 97.17% | 99.67% | 99.94% | 99.91% |
TP: True Positives, FP: False Positives, FN: False Negatives, TN: True Negatives
Funding
- Commercial Support – Somatus