Clear Sky Science · en

Innovating global regulatory frameworks for generative AI in medical devices is an urgent priority

· Back to index

Why this matters for your health

Generative artificial intelligence, including tools like advanced chatbots, is rapidly moving into doctors' offices and hospitals. These systems can draft clinic notes, answer health questions, and even suggest diagnoses. This review explains why our current rules for medical devices are not ready for such flexible and unpredictable technologies, and why updating them is essential for safe, fair, and trustworthy care worldwide.

Figure 1. How AI tools, doctors, and patients connect through smarter rules to deliver safer medical care.
Figure 1. How AI tools, doctors, and patients connect through smarter rules to deliver safer medical care.

New tools that do much more than one job

Older medical software was built to do one clearly defined task, such as spotting a tumor on a scan. Generative AI and large language models, by contrast, can handle many different jobs, from summarizing medical records to advising on treatment options. They are trained on huge collections of online text, images, and other data, which makes them powerful but also hard to fully understand or control. Because their answers can change from one use to the next, and can include confident mistakes or "hallucinations," they do not fit neatly into existing medical device categories, which were designed for more predictable tools.

Why current safety rules fall short

Regulators have tried to keep up using a "total product life cycle" approach that follows a device from design through real world use. This helps with many kinds of AI, but the paper argues that it is not enough for large language models. It is nearly impossible to inspect all the training data for errors or hidden personal information. Evaluating how well these systems work is also tricky, because their long, open ended answers are hard to score with simple accuracy tests. Studies show that some models perform well on exam style questions yet struggle with messy real life cases, and may do worse than clinicians at key tasks. On top of this, there is no agreed way to measure or monitor bias, meaning the systems could quietly work less well for certain groups of patients.

Hidden risks after deployment

Once large language model tools are released, keeping track of their safety becomes even more complex. Many models are built on shared base systems, then tweaked or retrained by different companies, making it difficult to know exactly what data and changes lie underneath. Some tools reach patients directly as health advice apps without formal approval. Problems may go unreported, especially when errors are buried in long clinic notes drafted by an AI scribe. Existing approval routes that rely on showing "similarity" to older products may be misused for tools that are, in fact, quite different. At the same time, ethical issues such as privacy, autonomy, trust, and the impact on the doctor patient relationship are only partly addressed in current regulations.

Figure 2. How messy health data is filtered by oversight steps into safer AI tools used in everyday medical devices.
Figure 2. How messy health data is filtered by oversight steps into safer AI tools used in everyday medical devices.

Building smarter and fairer oversight

The authors highlight emerging ideas to make regulation more flexible and effective. "Regulatory sandboxes" allow new AI tools to be tested under supervision in limited settings, so regulators and developers can learn from experience and adjust rules quickly. New concepts like "software as a medical service" aim to treat highly automated AI agents more like ongoing health services than fixed products. The paper also stresses the importance of understanding the full supply chain, from data collection and model building to cloud hosting and hardware, so that health systems can stay resilient when digital tools fail or are attacked. Global networks of regulators, researchers, and health systems are beginning to share checklists, testing standards, and oversight labs to align their efforts.

Keeping equity at the center

A major concern is how generative AI might widen or narrow the health gap between rich and poor regions. If models are trained mostly on data from high income countries, they may perform poorly in low resource settings or for under represented communities. The paper calls for deliberate inclusion of perspectives and data from low and middle income countries, and for support to help these regions build and deploy their own AI tools safely. Health equity focused reporting standards and evaluation tools can surface hidden biases, while collaborations can help move successful AI from pilot projects into real clinics without leaving vulnerable groups behind.

What this means going forward

In plain terms, the article concludes that generative AI in medicine is moving too fast for existing rulebooks. To protect patients and earn trust, countries need to work together on new, adaptive regulatory frameworks that can keep pace with changing models while guarding privacy, safety, and fairness. The authors envision independent global bodies setting shared standards, much like those that exist for cybersecurity, so that hospitals and patients everywhere can benefit from these tools without being exposed to avoidable harm.

Citation: Ong, J.C.L., Ning, Y., Liu, M. et al. Innovating global regulatory frameworks for generative AI in medical devices is an urgent priority. npj Digit. Med. 9, 364 (2026). https://doi.org/10.1038/s41746-026-02552-2

Keywords: generative AI, medical devices, healthcare regulation, large language models, health equity