Clear Sky Science · en

CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China

· Back to index

Why this long-term heart data matters

Heart disease is the leading cause of death worldwide, yet doctors and policymakers often lack detailed, real-world information on how patients move through the health system over many years. This study introduces CardioEHR, a large, carefully anonymized collection of hospital records from tens of thousands of cardiovascular patients in central China. Covering both the years before and after COVID-19, the dataset provides a rare window into how people seek care, how their illnesses unfold, and how changes in policy and society ripple through everyday medical practice.

Figure 1
Figure 1.

A decade of real hospital life

CardioEHR brings together two large sets of electronic health records from Wuhan Union Hospital, a major medical center in central China. One set comes from an older hospital system that tracked patients from 2010 to 2020; the other comes from a newer, research-focused platform that spans 2011 to 2024. In total, the resource includes more than 70,000 patients whose care mainly revolves around heart and related long-term conditions. For each person, the data include basic characteristics such as age and sex, hospital admissions and discharges, diagnoses, lab test results (including COVID-19 tests), and where they live. Because records stretch across many years, researchers can follow how a patient’s care changes over time rather than only seeing isolated hospital stays.

Who these patients are and how they move through care

The authors examined how patients flow through different hospital departments and how this has shifted between the older and newer record systems. In the earlier cohort, most people were between 50 and 70 years old and were admitted to and discharged from the cardiology department, reflecting a steady stream of older patients with serious heart problems. Transfers to other departments were less common but hinted at patients with multiple chronic diseases. In the later cohort, the typical patient was somewhat younger and entered the hospital through a wider mix of departments, with more frequent moves between services. This pattern suggests that the newer system captures a broader and more complex mix of illnesses, giving a more complete picture of how cardiovascular problems intersect with other conditions.

The role of place and time

Beyond hospital walls, the team linked each patient’s de-identified home region to public statistics from the China Statistical Yearbook, such as local income, number of hospitals, available beds, and number of doctors. This allows researchers to study how neighborhood wealth and health resources relate to who gets hospitalized and how often they return. The authors also looked at monthly trends in visit numbers and the time between repeat visits. They found regular patterns of follow-up in these chronically ill patients, as well as changes over the years that may reflect health reforms, seasonal effects, or the disruptions and adaptations brought on by the COVID-19 pandemic.

How privacy is protected while keeping details useful

To make CardioEHR safe for sharing, the team applied a strict, multi-step process to strip away direct identifiers and blur sensitive details without destroying the medical story. Names, ID numbers, exact addresses, and phone numbers were removed, and each person was assigned a one-way encrypted code so their records could still be linked across tables. Actual calendar dates were shifted by a random amount unique to each patient, preserving the order and spacing of their visits but hiding the real dates. Diagnoses were mapped to standard codes, rare labels were grouped, and lab tests were converted to common units and checked for outliers. The final dataset is organized into five clean tables—patient details, visits, diagnoses, lab tests, and regional socioeconomic indicators—for each of the two cohorts, all accessible under a controlled data-use agreement.

Figure 2
Figure 2.

What this resource means for future health

In plain terms, CardioEHR is a long, anonymized diary of how tens of thousands of people with heart and related diseases interact with the Chinese health system over more than a decade. Because it combines clinical details, living conditions, and the unique period before and after COVID-19, it can help scientists build better prediction tools, policymakers test the impact of reforms, and hospitals understand where care is working or falling short. By carefully balancing privacy with detail, the dataset opens a powerful new window into cardiovascular health and healthcare delivery in one of the world’s largest populations.

Citation: Zha, L., Fu, C., Sha, X. et al. CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China. Sci Data 13, 451 (2026). https://doi.org/10.1038/s41597-026-06855-7

Keywords: cardiovascular patients, electronic health records, China hospital data, longitudinal health dataset, COVID-19 healthcare use