Abstract: PO0910
Machine Learning Classification of Tweets for Patient Dialysis Experience
Session Information
- Leveraging Technology and Innovation to Predict Events and Improve Dialysis Delivery
November 04, 2021 | Location: On-Demand, Virtual Only
Abstract Time: 10:00 AM - 12:00 PM
Category: Dialysis
- 701 Dialysis: Hemodialysis and Frequent Dialysis
Authors
- Leidner, Alexander S., Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
- Gay, Hawkins, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
- Ho, Bing, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
Background
Popular microblog (e.g. Twitter, Facebook) services provide a continuous stream of public health information. This data has been used to monitor viral spread, medication adherence, and false health information. There are thousands of posts on Twitter daily regarding personal dialysis experience, access, and side effects. While these posts include valuable public health information, evaluating these posts to meaningfully assist dialysis patients is difficult as there are even more tweets mentioning dialysis in a professional context. We aimed to modify a state of the art natural language model to classify posts about dialysis as personal or professional.
Methods
We filtered posts containing the word dialysis. Posts were manually labeled as personal or professional by a nephrologist depending on the context dialysis was mentioned. The data was randomized and split for 60% training, 20% validation, and 20% testing. The text was preprocessed to remove extraneous characters and input into a Bidirectional Encoder Representations from Transformers (BERT) model for fine tuning, and a term frequency inverse document frequency vectorized Multinomial Naive Bayes Classifier.
Results
We collected 6011 tweets from May 3, 2021 to May 14, 2021.1000 tweets were randomized and labeled. 57% were categorized as professional. BERT and Naive Bayes models attained 88% and 82% accuracy, respectively, on the testing data. The BERT model classified far less false negatives with a small increase in false positives (Figure 1).
Conclusion
BERT's semantically rich word embeddings can enhance social media mining algorithms on dialysis content. We show superiority of a BERT model over a traditional count-based language model. This method can be easily applied as a pre-processing step to remove noisy posts to better study dialysis and other health trends in social media. This novel processing task and pipeline have broad clinical and public health implications for reducing the amount of data and time required for accurate, real-time monitoring of patient level posts.