Clear Sky Science · en

QR code security: an adaptive retraining approach for dynamic URL-based threat detection

2026-04-22 · Back to index

Why tiny squares matter to your safety

QR codes have quietly become gateways between the physical world and the internet, letting us jump to menus, payment pages, apps, and more with a single scan. But the same convenience that makes them so handy also makes them an attractive tool for criminals, who can hide dangerous web links inside innocent-looking black-and-white squares. This paper explores how an advanced kind of artificial intelligence can learn to tell safe QR-code links from harmful ones, and keep improving as new scams appear.

From handy shortcuts to hidden traps

Over the past decade, and especially during the COVID-19 pandemic, QR code use has exploded, with tens of millions of scans recorded in a short span. Most of these codes simply point to routine websites. However, attackers have realized that people rarely check where a code leads before scanning, and often trust any code posted in public or shared by a service. By embedding malicious web addresses in QR codes, criminals can direct users to phishing pages that steal passwords, or to sites that secretly install malware. This study focuses on that invisible layer—the web address, or URL, hidden inside each code—because it is the real vehicle for attacks, unlike physical tampering with the printed pattern.

Why older defenses fall short

Traditional defenses try to block harmful links in two main ways. Some rely on lists of known bad URLs, which are simple but easily sidestepped once attackers change their addresses. Others use machine learning trained on hand-crafted features, such as how long a URL is or whether it contains certain symbols or words. These methods can work reasonably well, but they tend to be rigid and depend heavily on patterns seen in old data. As criminals invent new tricks and vary how their links look, these fixed models struggle to keep up, leading either to missed threats or too many false alarms.

A smarter reader for web addresses

The authors propose a new system built on BERT, a powerful AI model originally designed to understand natural language. Instead of sentences and paragraphs, they feed BERT the character strings that make up URLs. First, the system scans a QR code and extracts its embedded URL. That URL is then broken into tokens and passed through a compact version of BERT, which converts it into a rich numerical representation that captures subtle patterns and relationships within the string. On top of this representation, the researchers add a lightweight statistical classifier that decides whether the link is likely benign or malicious. This design lets the system pick up on complex cues that simpler models miss, even though URLs are not regular language.

Learning and relearning as threats evolve

A key feature of the approach is that it does not remain frozen after its first training. The authors start with a balanced collection of about 20,000 labeled URLs—some safe, some malicious—from a public dataset. Once the model is tuned on this data, they connect it to a live feed of newly discovered harmful URLs from a service called URLhaus, and periodically mix these fresh examples with additional safe links. Each retraining round updates the model so it can recognize emerging attack styles while preserving what it already knows. Tests show that even after repeated updates, accuracy stays very high: around 98–99% on the original data and about 97% on larger, updated sets, with the system catching almost all malicious links while rarely flagging safe ones by mistake.

How this helps everyday users

To a layperson, the outcome is simple: when you scan a QR code, a behind-the-scenes AI can quickly decide whether the hidden link seems trustworthy. If it looks safe, you are sent on to the website; if it appears dangerous, you can be warned or blocked from visiting. By combining a strong language-style model with continuous retraining on real-world attack data, this work offers a flexible shield that adapts as scammers change tactics. Although it demands solid computing resources, the approach shows that smart, evolving filters can make the humble QR code a much safer doorway to the online world.

Citation: Almousa, H., Alsuhibany, S.A. QR code security: an adaptive retraining approach for dynamic URL-based threat detection. Sci Rep 16, 13143 (2026). https://doi.org/10.1038/s41598-026-43002-z

Keywords: QR code security, malicious URL detection, phishing protection, BERT-based model, adaptive retraining