Clear Sky Science · en
Towards an improved efficient leakage-resilient enhanced private set union
Why sharing lists can threaten privacy
Many organizations keep sensitive lists—such as suspicious IP addresses, customer IDs, or medical study participants—that they would like to combine with another party’s list without exposing their own data. A tool called private set union promises exactly that: it lets two sides learn the combined list of unique items, but nothing more. This paper shows that even cutting‑edge versions of this tool can quietly leak extra information while they run, and introduces a new design that keeps the benefits while sharply reducing these hidden risks and the computing cost.

What private set union is trying to protect
Imagine two companies comparing cyber‑attack blacklists. Each wants to end up with the full list of all suspicious IP addresses seen by either side, so they can better defend their networks. At the same time, each company’s detection methods—and therefore its exact blacklist—are trade secrets. If one could infer which addresses the other has or does not have, it might uncover those methods. Classic private set union protocols already hide the direct overlap between the lists, but recent research has revealed that they may still reveal clues during the computation itself or through patterns in how items are arranged in internal data structures.
Hidden leaks in fast earlier methods
Earlier scalable schemes relied on a recipe that first checked, item by item, whether each element from one list appeared in the other, and then used those answers to deliver only the “new” items. Later work showed that this process inherently reveals, before the protocol finishes, how many items the lists share. A curious participant can exploit this by aborting and rerunning the protocol with slightly tweaked inputs, gradually learning which specific items overlap. Other fast schemes used hashing—placing items into bins according to a hash function—to organize data. Once one party learns which of the other’s items are unique, it can cross‑check the pattern of filled and empty bins to deduce which of its own items definitely do not appear in the other list, a form of “hash‑based” leakage.
Locking items behind random disguises
The new protocol tackles both of these problems at once. Before any hashing happens, each party runs a cryptographic exchange that turns every item into a random‑looking token. The crucial property is that identical items from the two lists are turned into identical tokens, while different items produce unrelated ones—and neither side learns the secret key that links tokens back to real values. These disguised tokens are then placed into hash‑based tables and passed through a series of carefully structured, randomized steps that decide, in effect, whether tokens match, without revealing which token sits in which bin. Repeating this process with fresh randomness in every run prevents an attacker from correlating information across multiple executions.

Cutting cost with a smarter data structure
Security alone is not enough if a protocol is too heavy to use at scale. The authors therefore redesign one of the most expensive components: an internal module that previously relied on a batched cryptographic primitive to compare many items at once. They replace it with a “bidirectional” oblivious key‑value store, a compact structure that lets one party encode key–value pairs so that the other can query them without learning anything beyond whether a key is present. By arranging two such encodings to play off one another, the protocol can tell when tokens line up across the two lists while avoiding work on empty or dummy bins. This change slashes both the amount of data sent over the network and the time spent on computation, especially for large lists.
What the experiments show in practice
To test their ideas, the authors implemented their protocol and compared it with the best existing enhanced private set union design under the same, stricter privacy goals. Across a wide range of list sizes and network conditions, their method consistently reduced communication by about 1.1 to 3 times and sped up running time by roughly 1.0 to 1.7 times. Importantly, these gains hold even after adding the extra cryptographic layer that prevents hash‑based leakage, which earlier efficient schemes ignored. The results suggest that stronger protection need not come with a large performance penalty.
Why this matters for real‑world data sharing
In plain terms, this work shows how two parties can combine sensitive lists while sharply limiting what each can infer about the other’s data—even from subtle side effects during the protocol. By disguising items before hashing and using more frugal internal structures, the new design closes off known leakage channels and remains fast enough for very large data sets. This makes privacy‑preserving union of blacklists, customer IDs, or other identifiers more practical for companies and institutions that need to collaborate without laying bare the patterns inside their own data.
Citation: Liu, Q., Bae, J. & Lee, JW. Towards an improved efficient leakage-resilient enhanced private set union. Sci Rep 16, 10134 (2026). https://doi.org/10.1038/s41598-026-40531-5
Keywords: private set union, data privacy, cryptographic protocols, secure data sharing, leakage resilience