Clear Sky Science · en
Predicting congregational and crowd spread-out flow using YOLOv4 and DeepSORT
Why watching crowds from above matters
When millions of people gather in one place, a simple stumble or sudden rush can turn dangerous in seconds. The annual Hajj and Umrah pilgrimages in Saudi Arabia draw up to four million worshippers, creating some of the densest crowds on Earth. This paper explores how artificial intelligence can watch these vast moving crowds through cameras, automatically count people, follow their movement, and warn authorities before dangerous congestion builds up.
Big gatherings, big risks
Traditional crowd control relies on human observers, fixed barriers, and carefully planned routes. But human eyes tire, and crowds behave in unexpected ways. During Hajj, worshippers move between key sacred sites along walkways, roads, and open plazas that can quickly become bottlenecks. The authors argue that to keep people safer, officials need tools that can see the whole picture in real time: where people are dense, where they are thinning out, and how quickly they are entering or leaving a space.
Teaching computers to see people
To build such a tool, the researchers use two advanced computer vision methods. The first, called YOLOv4, is trained to spot people in images by drawing boxes around each person, even in tightly packed scenes. The second, called DeepSORT, takes those detections and follows each person across many video frames, giving each an invisible ID so their path can be traced over time. The team assembled a large collection of images and video from the 2019 Hajj, taken in several areas around Mount Arafat. They carefully labeled tens of thousands of human heads and bodies, cleaned out blurry material, and augmented the data with small variations so the system would remain reliable under different lighting, angles, and crowd densities.

From moving dots to crowd levels
Once the system can find and follow individuals, it can turn these moving dots into a picture of how the crowd behaves. By counting how many people enter and leave a given area and how tightly they are packed, the system classifies crowd density into three intuitive levels: low, medium, and high. Instead of relying on rough estimates or delayed reports, managers can see where people are spreading out smoothly and where critical choke points are forming. Because DeepSORT is designed to cope with people blocking each other from view and looking very similar (as in pilgrims’ mostly white clothing), it can maintain stable tracks even in dense, visually confusing scenes.
How well the system performs
The authors tested their setup thoroughly. They compared several versions of the YOLO family as well as different tracking methods, ultimately finding that YOLOv4 paired with DeepSORT performed best on real Hajj footage. After tuning the models and training them on the curated dataset, YOLOv4 correctly detected people with over 95% accuracy and a very high balance between missed detections and false alarms. DeepSORT tracked individuals with more than 91% accuracy, recovering their paths even when they were briefly hidden behind others. Compared with similar systems used for traffic, social distancing monitoring, or other crowd scenes, this Hajj-focused approach matched or exceeded the best reported results while working in one of the most challenging environments.

What this could mean on the ground
In practice, such a system could sit behind existing surveillance cameras and continuously monitor how pilgrims move. When the number of people in a walkway nears its safe limit, or when a plaza begins to fill unevenly, the software could alert officials to adjust barriers, redirect flows, or send messages to volunteers on the ground. Beyond safety, the same insights could improve where to place medical teams, washrooms, and transport links, and could help planners redesign routes for future seasons based on real data rather than guesswork. The authors also note that the same approach could assist at major sporting events, concerts, or festivals.
A smarter, safer way to guide the masses
For a layperson, the key takeaway is simple: computers can now watch huge crowds more carefully and consistently than any human team, turning raw video into early warnings and practical guidance. By combining person detection and tracking into one robust system, this research shows that it is possible to monitor the flow of millions of worshippers in real time, classify how crowded each area is, and act before situations become dangerous. If developed further and deployed responsibly, such tools could make large religious gatherings and other mass events safer, smoother, and less stressful for everyone involved.
Citation: Aljojo, N., Ardah, H., Alamri, A. et al. Predicting congregational and crowd spread-out flow using YOLOv4 and DeepSORT. Sci Rep 16, 13869 (2026). https://doi.org/10.1038/s41598-026-44719-7
Keywords: crowd management, computer vision, Hajj safety, object tracking, deep learning