Clear Sky Science · en
Vision transformers- Kolmogorov–Arnold networks-based consumer driven surface cracks classification model
Why Cracks in Everyday Structures Matter
Cracks in roads, bridges, and building walls may start as hairline fractures, but they can grow into serious safety hazards and expensive repairs. Today, most crack checks still rely on people walking around with clipboards or cameras, which is slow, costly, and easy to get wrong—especially for tiny or hidden flaws. This paper introduces a new computer-based method that spots and classifies surface cracks in concrete and asphalt with very high accuracy, while being efficient enough to run on phones, drones, or other small devices. That opens the door to routine, low-cost monitoring of the structures we use every day.
From Manual Checks to Smart Cameras
Inspecting surfaces by eye has clear drawbacks: it is subjective, time-consuming, and sometimes dangerous for inspectors working on busy roads or high bridges. Earlier computer programs tried to find cracks in photos using simple tricks such as edge detection and thresholding, but they struggled with shadows, changing light, or rough textures that can look like cracks. More recent systems use machine learning, where algorithms learn patterns from many images. Convolutional neural networks and newer vision transformers have already pushed accuracy much higher, yet most still have trouble handling fine, irregular cracks under real-world conditions and rarely explain how they reach their decisions.

A Hybrid AI Model That Sees More Clearly
The authors designed a hybrid deep learning model that combines several strengths into one pipeline. First, a compact network called MobileNet V3 looks at the image and pulls out local details such as edges, micro-cracks, and texture. Next, a transformer model called LeViT analyzes how different parts of the image relate to each other, capturing long-range patterns—like how a thin crack snakes across a slab. A third component, an improved Linformer transformer, focuses on efficiently modeling these long-range relationships even in high-resolution images, but with reduced computation, so it is practical for small devices.
Mixing Signals and Making a Final Call
Instead of simply stacking these components, the system uses a “gated feature fusion” step that learns which pieces of information from each network truly matter and which are redundant. This helps the model keep useful clues about crack width, length, and continuity while ignoring distracting background patterns. The fused signal is then passed to a Kolmogorov–Arnold Network, a special type of neural network that represents complex relationships using flexible mathematical curves. This classifier is tuned to draw a sharp boundary between “crack” and “no crack” cases, even when the patterns in the data are subtle or messy, while staying fast and compact enough for real-time use on edge hardware such as smartphones or embedded boards.

Opening the AI Black Box
Because infrastructure safety depends on trust, the authors also focus on making the model’s decisions understandable. They apply two explanation tools—SHAP and LIME—to highlight which image regions and features most influenced a given prediction. When the model detects a crack, these tools typically emphasize the crack path and its immediate surroundings, confirming that the system is “looking” at the right places rather than being misled by stains or shadows. During development, these explanations also exposed weaknesses, such as a tendency to react to painted lines on asphalt, which led the team to adjust the training process and cut false alarms.
How Well It Works and Why It Matters
Tested on large and varied collections of concrete and asphalt images—over 40,000 photos from multiple public datasets—the model reached about 99.5% accuracy and maintained strong performance even on new images it had never seen before. It also ran with fewer calculations and less memory than many competing approaches, making it suitable for integration into consumer electronics, drones, and low-cost inspection systems. This means homeowners, facility managers, and city engineers could one day use ordinary smart cameras or mobile apps to continuously monitor surfaces and flag early crack formation, turning structural care from a rare, manual event into a routine, data-driven safeguard.
Looking Ahead to Safer Structures
In simple terms, the study shows that a carefully designed blend of lightweight networks, efficient transformers, and an advanced classifier can reliably tell cracked from intact surfaces while explaining why it reached that verdict. There are still open challenges—such as dealing with extreme lighting or very limited device power—but the work points toward a future where buildings, bridges, and pavements can be watched over automatically, helping prevent small flaws from growing into dangerous failures.
Citation: Wahab Sait, A.R., Sankaranarayanan, S. & Yu, Y. Vision transformers- Kolmogorov–Arnold networks-based consumer driven surface cracks classification model. Sci Rep 16, 9183 (2026). https://doi.org/10.1038/s41598-026-40359-z
Keywords: infrastructure monitoring, concrete cracks, asphalt pavement, deep learning, computer vision