Clear Sky Science · en
Efficient multi-party private set union resistant to maximum collusion attacks
Why sharing blacklists without sharing secrets matters
Many organizations need to collaborate on data without exposing the sensitive details behind it. Think of banks that want to pool their blacklists of suspicious accounts, or companies that wish to combine fraud indicators, while keeping customer information and internal methods secret. This paper explores how several parties can compute the union of their private lists—everything that appears in at least one list—so that everyone learns only the final combined list, but nothing else about one another’s data. It focuses on making such collaboration both highly secure against collusion and efficient enough for very large datasets.

Bringing many private lists together
The task studied here is called multi‑party private set union. Each participant holds a confidential collection of items, such as account numbers or identifiers, and they want to learn the union of all items across all parties. No one should learn which party contributed which item, nor see any item that is not in the final union. Existing approaches either rely on heavy public‑key cryptography or general secure computation frameworks, which become impractical when sets contain millions of entries or when many parties are involved. A recent protocol by Gao and colleagues improved scalability and could resist “maximum collusion,” meaning it stayed secure even if all but one participant secretly cooperated, but it still required transmitting enormous volumes of bulky encrypted messages.
Cutting the bandwidth bill with a smarter mix of tools
The authors propose a new protocol that keeps Gao et al.’s strong collusion resistance while sharply reducing communication and speeding up runtime. Their key idea is to reserve expensive public‑key encryption for only the most critical pieces—the items that must stay encrypted and later be jointly decrypted—while using lighter, faster symmetric‑key encryption for the bulk of intermediate data. They design a two‑phase “one‑leader” protocol in which all parties prepare hash‑based data structures over their sets, then interact so that only a designated leader finally learns the union. A central building block, called batched equality‑tested oblivious random generation, allows two parties to generate matching random masks whenever their hidden items are equal, without revealing which items matched. This supports efficient, privacy‑preserving filtering of duplicates as information flows through the protocol.
From a single leader to a fairer shared result
In some collaborations, having a single leader who alone receives the result can be risky or politically unacceptable. The authors therefore extend their design to a “leaderless” version in which every party obtains the union directly. They keep the same preparation work, but repeat the interaction phase multiple times, rotating which party plays the leader role in each run. Each round produces the union for a different participant, using joint decryption and shuffling so that no one can trace items back to their original owners. This leaderless design improves robustness and fairness—no single party can withhold the result or become a bottleneck—at the cost of additional communication, which is roughly multiplied by the number of parties.

How the new protocol scales in the real world
The team implemented their protocols and compared them against the best previous scheme under a variety of set sizes and numbers of participants. They instantiate the public‑key component with threshold ElGamal encryption and plug in existing libraries for hashing, equality tests, and basic cryptographic primitives. Across realistic parameter choices, their one‑leader protocol cuts communication volume by about four to five times and accelerates runtime by about 1.3 to 1.8 times, depending on the number of parties and items. The leaderless protocol naturally uses more bandwidth, but still benefits from the same design efficiencies, and its decryption steps can be overlapped so that total runtime grows more slowly than its communication cost.
What this means for privacy‑preserving collaboration
To a non‑specialist, the takeaway is that this work makes it more practical for many independent organizations to combine sensitive lists without revealing their raw data, even if nearly all of them were to collude. By carefully choosing when to use heavy‑duty encryption and when lighter methods suffice, the authors deliver protocols that are significantly leaner in bandwidth yet maintain strong security guarantees. Their leaderless variant further removes dependence on any single party to distribute results. Together, these advances move privacy‑preserving data union from a mostly theoretical exercise toward a deployable tool for sectors like finance, cybersecurity, and data analytics.
Citation: Liu, Q., Lee, JW. Efficient multi-party private set union resistant to maximum collusion attacks. Sci Rep 16, 13230 (2026). https://doi.org/10.1038/s41598-026-41069-2
Keywords: private set union, secure data collaboration, multiparty computation, privacy-preserving encryption, collusion-resistant protocols