Clear Sky Science · en
A Blue Start: A large-scale pairwise and higher-order social network dataset
Why this matters for everyday online life
Social media isn’t just a jumble of individual friendships and follows; it’s also made of groups, bundles, and crowds that shape what we see and how ideas spread. This paper introduces a massive new dataset from the Bluesky platform that captures both one‑to‑one “follow” ties and richer group structures called starter packs. By opening up this kind of information, the authors give researchers an unprecedented look at how online communities form, grow, and react to real‑world events—from policy changes on rival platforms to political turning points.

From follows to groups
Traditional social network studies treat relationships as pairs: one person follows another, one account replies to another. But many of our real online experiences are organized around groups—lists of people to follow, collections of recommended accounts, or curated bundles of content. The authors focus on Bluesky’s “starter packs,” user‑made collections of accounts and feeds that help newcomers rapidly build their timelines. Unlike simple follow links, each starter pack can include dozens or even hundreds of accounts at once, making them a natural way to study group‑level behavior rather than just individual friendships.
Building a map of a new platform
To assemble the dataset, the team tapped into Bluesky’s open technical infrastructure. Every account has a long‑term identifier stored in a public directory, and user activity lives on personal data servers that can be queried through an open API. The authors systematically walked through this infrastructure: first exporting all known identifiers and their creation times, then asking each personal data server for the list of accounts it hosts, and finally downloading each reachable user’s full activity record. From those raw logs they extracted two core ingredients: who follows whom, and which accounts appear together in starter packs.
Protecting users while keeping structure
Because this work exposes the shape of millions of people’s social connections, the authors took steps to reduce the risk of identifying individuals. Instead of publishing the original account identifiers, they replaced every user and starter pack with anonymous integer codes. They also stripped out descriptive text like starter‑pack names and rounded all timestamps to the nearest day. Even with these safeguards, the basic wiring of the network is preserved: the same anonymous code appears consistently across the account list, the follow network, and the starter‑pack data, allowing researchers to study structure and dynamics without directly seeing who any person is.

What the data reveal about Bluesky
The resulting snapshot is enormous: roughly 39.7 million accounts, 2.4 billion follow relationships, and 365,842 starter packs involving about 2 million unique users and feeds. Most users never create a starter pack, but those who do typically make just one, and the sizes of these packs cluster around Bluesky’s design choices—minimum and maximum allowed sizes, plus an automatic feature that pre‑fills a pack with about fifty accounts. The authors show that almost all users are tied together in a gigantic web of follows, while the starter‑pack network has a huge overlapping core where many packs share the same accounts. Activity spikes in both account creation and following clearly line up with key events, such as changes to the rival X/Twitter platform or major political dates, suggesting that people move and connect in response to broader news and policy shifts.
Why groups add something new
One of the paper’s key findings is that the “most important” accounts look different depending on whether you measure importance by follows or by starter‑pack membership. An account that appears in huge numbers of starter packs is not always the one with the most followers, and vice versa. Statistical comparisons confirm only moderate agreement between the two rankings, meaning that group‑based and pairwise views offer complementary insights. This dual perspective lets researchers ask questions that were previously out of reach, such as how curated groups help newcomers integrate into a platform, how overlapping groups shape information flows, or how online communities reorganize during moments of crisis.
What this work means going forward
For non‑specialists, the core message is that online social life can’t be fully understood by counting followers alone. The “A Blue Start” dataset shows how group structures like starter packs help knit a new platform together, and how they respond to big outside events. By making this giant, carefully anonymized map of Bluesky publicly available, the authors provide a foundation for future research on everything from misinformation and political talk to recommendation algorithms and digital public squares. In short, the paper’s conclusion is that capturing both individual ties and groupings is essential if we want to understand—and ultimately guide—the health of our online social worlds.
Citation: Smith, A.H., Amburg, I., Kumar, S. et al. A Blue Start: A large-scale pairwise and higher-order social network dataset. Sci Data 13, 585 (2026). https://doi.org/10.1038/s41597-026-06920-1
Keywords: Bluesky social network, starter packs, higher-order networks, online communities, social media datasets