Week 2 (3/2-3/6) - Deep Dive into Clinical Practical Guidelines!
March 6, 2026
Welcome back to my blog! This week, I will discuss how I completed preparing and cleaning the data, as well as looking more into how to look for and process the CPGs (or clinical practical guidelines).
The data was messy with many missing values and unnecessary columns. The few missing values perpetrated as the size of the data increased, as more information on each stay was added to the final data. Thus, I had to go back and clean the data, dropping irrelevant columns like medication class code. For the information recorded at triage and vitals taken during the visit, I dropped all stays with missing data, as there was no way to manually input these values or drop the entire feature. As I merged the data, I made sure to only include stays where information was available for all features.
As I mentioned in my last post, I was also looking for a system of conversion between the ICD terms of the MIMIC Demo-IV dataset and the MeSH terms used in searching for CPGs. Originally, I found an article that discussed a possible way through the ULMS (Unified Medical Language System) [1]. However, after discussing with my mentors and attempting to use the online system, I realized that the two terminologies were too different to have a direct conversion. ICDs are used by medical professionals for notetaking and patient records, while MeSH was created to help organize and look through academic research in medicine more easily.
Since there are too many unique ICDs in the data to manually search for MeSH equivalents or medical advice for each one, my new idea is to group the ICD titles into fewer, more general topics using AI. From there, I would find correlating MeSH terms for these larger concepts and search from there. I also plan to expand my search to systematic reviews and credible online sources like AMA, Mayo Clinic, and Cleveland Clinic, as these have already done the work of a thorough literature review and summarizing the results. Comparing the results of Gemini and Claude when the list of ICD titles and the purpose was inputted, I chose to use Claude’s classification. It provided 18 groups instead of 14, allowing more nuance without increasing the search needed by much.
The other problem was how to efficiently tell how much and which literature would be best for the LLM. In my literature review, I found that recent research has started using transformer models to retrieve the medical literature and improve their representation for input into the models [2] . These models extract the core information organized in a PICO format, and then assess the quality and ambiguity of the clinical recommendations in the literature, minimizing bias. Previous projects have used these models to create their own corpuses, huge databases of medical literature that have already gone through processing [3, 4].
For next week, I will continue investigating the best path forward regarding the medical literature needed to finetune the LLM. Maybe the strengths of these models (extracting the important information in the PICO format and labelling recommendation strength) can be combined to apply onto new medical literature I collect, though this may be limited by computational power.
If you have worked with LLMs before, let me know what advice you have for best preparing the input data for the model! Next week, I will learn more about machine learning, and make a final decision on fine-tuning the LLM before starting the process.
[1] Cimino, J. J., Johnson, S. B., Peng, P., & Aguirre, A. (1993). From ICD9-CM to MeSH using the UMLS: a how-to guide. Proceedings. Symposium on Computer Applications in Medical Care, 730–734. https://pubmed.ncbi.nlm.nih.gov/8130572/
[2] Xu, Z., Ma, H., Ding, Y., Zhang, G., Weng, C., & Peng, Y. (2025). Natural Language Processing in Support of Evidence-based Medicine: A Scoping Review. Findings of the Association for Computational Linguistics: ACL 2025, 21421–21443. https://doi.org/10.18653/v1/2025.findings-acl.1103
[3] Nye, B., Li, J. J., Patel, R., Yang, Y., Marshall, I. J., Ani Nenkova, & Wallace, B. C. (2018). A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. Proceedings of the Conference. Association for Computational Linguistics. Meeting, 2018, 197. https://pmc.ncbi.nlm.nih.gov/articles/PMC6174533/
[4] Read, J., Velldal, E., Cavazza, M., & Georg, G. (2016). A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations. ACL Anthology, 1724–1731. https://aclanthology.org/L16-1272/

Leave a Reply
You must be logged in to post a comment.