Clear Sky Science · en

BarkVisionAI: Novel dataset for rapid tree species identification

· Back to index

Why tree bark and phone cameras matter

When we walk through a forest, we usually notice leaves, flowers, or towering canopies. But for much of the year—or in dense, shaded woods—those clues are missing. This study shows that the rough, patterned skin of trees—their bark—combined with everyday smartphone cameras and modern artificial intelligence, can become a powerful tool for quickly identifying tree species and tracking the health of forests across India and, potentially, the world.

Figure 1
Figure 1.

A new way to see forests

The researchers behind BarkVisionAI set out to fill a major gap in how we recognize trees. Most existing photo collections for tree identification focus on leaves or other visible parts, and the few bark image datasets tend to be small, from limited regions, and shot under nearly identical conditions. That makes it hard for computer models trained on them to work in messy, real forests. BarkVisionAI changes this by assembling 156,001 bark photos from 13 important tree species across diverse forest types and ecological regions in India. Each picture is more than just an image: it is linked to precise location, time, and camera information, creating a rich resource for both ecology and artificial intelligence.

How the images were gathered

Collecting this many useful photos required close collaboration with forestry staff and tailored fieldwork in two Indian states, Himachal Pradesh and Odisha, which together capture eight major forest types and nine ecological regions. Forest guards and officers were trained to use a digital data collection platform on their phones, learning how to stand a set distance from the trunk, hold the camera perpendicular to the bark, and record accurate locations. Data collection ran from January to December 2024, spanning dry seasons, monsoon, and winter. Images were taken in the morning, afternoon, and evening, under different light and weather, and using 315 distinct camera models from 20 manufacturers. This deliberate variation ensures the dataset reflects the real-world challenges of working in forests rather than the controlled conditions of a lab.

Turning messy reality into a fair test

Real forests introduce many subtle biases: perhaps one species is photographed mostly with a specific phone, at a certain time of day, or at one elevation. A naïve AI model could “cheat” by learning these shortcuts instead of the true bark patterns. To avoid that trap, the team designed a careful selection process. From the full collection, they built a balanced subset of 36,400 images, with exactly 2,800 photos per species. Each species’ images were spread across elevation levels, seasons, leaf conditions (whether the tree canopy was in full leaf or bare), times of day, and camera models. These factors were combined into a fine-grained grid, and images were sampled so that no single lighting condition, device, or altitude would dominate. The result is not just a large dataset, but one crafted to push AI systems to pay attention to the bark itself.

Figure 2
Figure 2.

Putting artificial intelligence to the test

With this balanced dataset in hand, the researchers trained several popular image-recognition models, including well-known convolutional neural networks and a modern “vision transformer” model. All images were resized to standard dimensions, then split into training, validation, and testing sets. Among the models, a network known as ResNet50 performed best, correctly identifying species for about 87% of the test images. A closer look showed that accuracy still slipped under more difficult conditions—especially in low evening light and at higher elevations where environments are more complex. These patterns confirmed that lighting, season, and altitude are real obstacles for AI, and that controlling for these factors in the dataset was essential to reveal where models genuinely struggle.

What this means for forests and future tools

BarkVisionAI demonstrates that everyday tools—a smartphone and a walk in the woods—can feed a sophisticated system for rapid tree identification. For conservationists and forest managers, this opens the door to faster mapping of species, better tracking of biodiversity, and more timely monitoring of environmental change. For AI researchers, the dataset represents a demanding benchmark that captures subtle textures, shifting seasons, and diverse devices, highlighting that bark-based recognition is far from a solved problem. The study’s main message for non-specialists is clear: by carefully designing both data and algorithms, we can teach machines to read the stories written in tree bark, helping us understand and protect forests more effectively.

Citation: Chhatre, A., Saini, N., Parmar, A.K. et al. BarkVisionAI: Novel dataset for rapid tree species identification. Sci Data 13, 343 (2026). https://doi.org/10.1038/s41597-026-06711-8

Keywords: tree identification, forest monitoring, biodiversity, computer vision, India forests