Clear Sky Science · en

From “negative” trial to positive clinical impact: mitigating eligibility criteria–induced temporal selection bias in emulated clinical trials

2026-05-27 · Back to index

Why who gets into a trial matters

When we hear that a clinical trial found no difference between two treatments, it is tempting to think the story ends there. But what happens when doctors later try to replay that trial using real patient records from everyday care? This study shows that the rules about who is allowed into a “copycat” trial can quietly bend the results, sometimes more than the passage of time or changes in medical practice.

Figure 1. How large health record studies replay a heart drug trial and how patient selection changes the big picture result.

From controlled trial to real life

The original WARCEF trial compared two blood thinning drugs, warfarin and aspirin, in people with heart failure and a weakened heart pump but without a common rhythm problem called atrial fibrillation. In this carefully run experiment, more than 2,300 volunteers were randomly assigned to one drug or the other, and the trial found no clear winner for preventing death. Later, guidelines advised against routinely putting similar heart failure patients on long term warfarin unless there was a strong reason, in part because the reduced risk of stroke was offset by more serious bleeding.

Replaying WARCEF using health records

The new study asked what happens if researchers try to “emulate” WARCEF using electronic health records from the Mayo Clinic. Instead of randomly assigning drugs, they looked at thousands of patients who happened to be prescribed either aspirin or warfarin in routine care, before and after the WARCEF trial was completed in 2014. They used statistical methods to balance out obvious differences between the groups and followed an intention to treat approach, counting people with their starting drug even if later changes occurred, to mimic the logic of the original trial.

A surprising shift after 2014

At first glance, the results suggested an important change over time. Among patients treated before 2014, the study saw no meaningful difference in death rates between the two drugs, echoing the original trial. But among patients treated after 2014, warfarin was linked to a much higher risk of death than aspirin. When the researchers combined all years, the overall picture was dominated by these later patients, making aspirin look clearly safer. This pattern might suggest that once doctors saw the trial results and new guidelines, the way warfarin was used in practice changed, and its apparent performance worsened.

How one eligibility rule distorted the picture

A closer look told a different story. To stay faithful to the trial, the team tried to apply many of the same entry rules, including a score called the Modified Rankin Score, which describes how disabled a person is after a stroke. In real world records, that score is often recorded late, if at all. Requiring it before counting someone as “in” the study meant that many early deaths were never seen, making the survival curves artificially flat for years. When the researchers removed this single rule, the strange plateau vanished and the year 2014 no longer looked special. Across many different cut off years, the same pattern appeared: the choice of eligibility rules, not the calendar date, drove most of the differences in drug effects.

Figure 2. How tight entry filters on patients quietly delay and remove early deaths, changing the apparent safety of two heart drugs.

Lessons for using big health data

The study highlights that building trial like studies from health records is more than just copying dates and drug names. Each inclusion and exclusion rule must be translated into data that may be incomplete, delayed, or captured only for certain patients. A rule that seems harmless on paper can, in practice, filter out exactly the people who have early bad outcomes, tilting the comparison between treatments. The authors argue that careful testing of how each criterion changes who enters the study and when events occur is essential to avoid hidden selection bias.

What this means for patients and doctors

For people with heart failure and their clinicians, this work does not overturn the message of the original WARCEF trial or current guidelines. Instead, it offers a cautionary tale about how we interpret “real world” studies that try to mimic trials. Differences in results over time may reflect how we pick and monitor patients, not sudden shifts in how a drug works. Thoughtful design and transparent reporting of eligibility choices are key if we want large health record studies to truly inform everyday care.

Citation: Li, X., Rajaganapathy, S., Hu, X. et al. From “negative” trial to positive clinical impact: mitigating eligibility criteria–induced temporal selection bias in emulated clinical trials. npj Health Syst. 3, 36 (2026). https://doi.org/10.1038/s44401-026-00082-3

Keywords: trial emulation, eligibility criteria, heart failure, warfarin versus aspirin, electronic health records