‘Evaluation cannot be afterward’: Duke Health develops framework to evaluate AI use in care

Although it aims to use AI to advance health care, two Duke Health researchers see it as a tool that requires careful evaluation and thoughtful oversight. The considerations led Michael Pencina, vice dean for data science and chief data scientist of Duke Health, and Chuan Hong, assistant professor in biostatistics and bioinformatics, to develop SCRIBE, a framework that evaluates the performance of a new category of AI systems used to generate real-time clinical notetaking across several ethical metrics.

According to Hong, SCRIBE “offers a comprehensive evaluation by incorporating human evaluation, simulation, automated metrics and [large language models] as a charger for the best practice of AI in health care.” It tests and measures the safety, fairness and accuracy of LLMs and other generative AI tools used to take notes during an appointment.

“Evaluation of AI tools is … especially important in health care, because a lot of the time these AI tools are patient-facing,” Hong explained. “We need to make sure they are safe, and also trustworthy and have no bias.”…

Story continues

TRENDING NOW

LATEST LOCAL NEWS