Clear Sky Science · en

Data storage and retrieval with unnatural proteins expressed via E. coli

· Back to index

Why turning data into protein matters

Our phones, sensors, and online lives are flooding the world with information, and today’s hard drives and magnetic tapes may not keep up forever. This study explores a strikingly different idea: storing digital data inside laboratory-made proteins that can be produced by common bacteria. The authors show that these custom proteins can hold messages, survive harsh conditions better than DNA, and even support advanced tricks like selective access and secret "locked" information.

Figure 1
Figure 1.

From ones and zeros to chains of building blocks

Any digital file is ultimately a long string of ones and zeros. The researchers first convert these bits into a sequence of amino acids, the small building blocks that make up proteins. Each chosen amino acid stands for a short pattern of three bits, so a chain of amino acids becomes a coded message. These artificial sequences are then inserted into longer protein designs and produced inside Escherichia coli, a workhorse bacterium widely used in biotechnology. Once made, the proteins are dried into a powder, which becomes the physical medium that stores the information.

Why early designs struggled and collagen showed the way

The team’s first approach simply stitched together many data-carrying segments into one long protein. While elegant on paper, these unnatural chains did not behave well inside E. coli: they were poorly produced and easily chopped up by the cell’s own enzymes. To fix this, the researchers took inspiration from collagen, a tough structural protein found in bones and fossil remains that can persist for millions of years. They built a new template that mimics collagen’s repeating pattern and fused it with a collagen-like domain known to express well in bacteria. This collagen-style framework still leaves room to encode data, but gives the overall protein a more natural shape that the cell can tolerate and that resists unwanted breakdown.

Writing, reading, and scaling up protein memory

With the collagen-inspired design, the scientists successfully stored English text and famous quotes from multiple languages in several different proteins. They showed that E. coli can produce these data-bearing proteins at useful yields, and that standard biochemical tools can purify them without extreme effort. To read the stored information, the proteins are cut into shorter pieces by an enzyme, then analyzed by a sensitive mass spectrometer that weighs the fragments. Custom software reconstructs the original amino acid sequences and converts them back into bits. Even when up to about one in ten fragments are missing or wrong, built-in error-correcting codes allow the full messages to be recovered accurately, including when many different proteins are mixed together.

Figure 2
Figure 2.

Stability, selective access, and hidden messages

A key promise of molecular storage is long life. The authors compared one of their collagen-like proteins with a DNA sequence carrying the same message under hot and strongly acidic conditions. The protein retained most of its mass and remained readable after days at 70 degrees Celsius and at very low pH, while the DNA rapidly degraded. They then showed that extra short tags added to the protein ends can act like barcodes: using matching antibodies, they could pull out only the proteins related to a chosen quote from a complex mixture and read just that part of the data. By combining "decoy" proteins with ordinary tags and "secret" proteins marked only with special tags, they built a simple form of molecular cryptography, where only someone who knows the correct tag can reliably retrieve the hidden message.

What this means for the future of data

This work delivers the first full demonstration that entirely new, non-natural proteins can act as a robust medium for digital data, from writing and storage to accurate readout. While current capacities and speeds are far from everyday use, the approach offers very high potential density and impressive stability, especially for long-term archiving. As tools for designing, producing, and sequencing proteins continue to advance, data encoded in proteins could complement DNA and traditional hardware, enabling durable archives on Earth or even in space, and potentially allowing information to be stored directly within living systems under careful safeguards.

Citation: Zhou, Y., Ng, C.C.A., Liu, C. et al. Data storage and retrieval with unnatural proteins expressed via E. coli. Nat Commun 17, 3320 (2026). https://doi.org/10.1038/s41467-026-70061-7

Keywords: protein data storage, molecular memory, E. coli expression, collagen-like proteins, data cryptography