Clear Sky Science · en
A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization
Why this matters for patients and families
When someone leaves the hospital, families often turn to the internet with worried questions: Why was this test done? Were those medicines really needed? Today’s hospital records hold many of the answers, but they are written for doctors, not patients. This article introduces ArchEHR-QA, a new dataset designed to help researchers build and test artificial intelligence (AI) tools that can turn dense hospital notes into clear, accurate answers to real patient questions.

From online worries to hospital records
The researchers started with a simple idea: use real questions that people post on public health forums and pair them with real hospital records that could answer those questions. They collected patient and caregiver posts from a popular medical discussion site, focusing on situations where someone had recently been in the intensive care unit (ICU) or emergency department. These are times when people often feel scared and confused, and when discharge instructions and online searches may leave important concerns unresolved.
Building realistic question–answer pairs
Because the people in the forums and the patients in the hospital database are different individuals, the team carefully matched each online question with a de-identified hospital discharge summary that described a very similar medical situation. Clinicians then rewrote each layperson’s question into a short, precise version that a doctor might use, without changing what the patient actually wanted to know. Next, they combed through each hospital note sentence by sentence, marking which lines were essential, which were helpful extras, and which were not needed to answer the question. Finally, licensed clinicians wrote short, plain-language answers grounded only in the marked parts of the hospital record.

What the new dataset contains
The finished ArchEHR-QA collection includes 134 patient cases: 104 involving ICU stays and 30 from emergency visits. For each case, there is the original patient question, the clinician’s rephrased version, a carefully trimmed excerpt of the hospital note, sentence-level importance labels, and a clinician-written answer of about five sentences. The cases span many specialties—such as heart disease, lung problems, infections, and brain conditions—and cover a wide range of ages and backgrounds. All materials are shared in standard digital formats so that other researchers can easily use them.
Putting today’s AI models to the test
To show how ArchEHR-QA can be used, the authors evaluated several modern large language models that can run locally. They asked each model to answer the questions using the hospital note excerpts and to point to the exact sentences that supported its answers. The team then measured two things: how well the models chose the right evidence in the note (factuality) and how closely their answers matched the clinician-written responses (relevance). Different prompting strategies were tried, including asking the model to write the answer and pick evidence in one step, or to answer first and add evidence afterward. Overall, the best setups correctly captured about half of the most important sentences and produced answers that were somewhat, but far from perfectly, aligned with expert explanations.
How this work can ease the load on clinicians
The study also examined where the models went wrong. Sometimes they cited the right hospital sentences but misinterpreted them, or they relied too much on the wording of the patient’s question instead of the record itself. These flaws underline why strong benchmarks are needed before AI can safely draft messages for clinicians to review. ArchEHR-QA has already been used in an international research challenge, where dozens of teams experimented with multi-step systems that first find relevant sentences and then generate answers. The dataset can also support related tasks, such as finding key information in long notes or summarizing patient questions.
What this means for future care
In plain terms, this article offers a foundation for building trustworthy digital helpers that can explain hospital care in language patients understand, backed by what is actually written in their charts. By tying real-world questions to real clinical evidence and expert answers, ArchEHR-QA makes it possible to measure whether AI systems are both accurate and helpful. If such systems continue to improve, they could one day draft clear, individualized explanations for clinicians to review, reducing inbox overload while giving patients and families faster, more reliable answers about what happened in the hospital and what comes next.
Citation: Soni, S., Demner-Fushman, D. A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization. Sci Data 13, 523 (2026). https://doi.org/10.1038/s41597-026-06639-z
Keywords: electronic health records, patient questions, medical AI, clinical notes, question answering