Clear Sky Science · en
Improving deep neural network performance through sampling
Smarter AI With Tiny Coin-Flip Neurons
As artificial intelligence has grown more powerful, it has also grown ravenous for energy. Training and running modern image and language models can draw as much electricity as small towns. This paper explores a counterintuitive idea: instead of making neural networks ever more precise and complex, we might make their building blocks simpler and noisier—more like flipping digital coins—and then use clever sampling to get equal or even better results while saving energy.
From Precise Circuits to Probabilistic Brains
Most of today’s deep neural networks use “deterministic” units: feed in the same numbers and you always get the same answer. The authors focus on an alternative called probabilistic bits, or p-bits. Each p-bit behaves like a tiny, biased coin that flips between 0 and 1 according to probabilities set by its inputs. By taking several samples from the same network of p-bits and averaging their outputs, the system can approximate richer, multi-bit behavior without storing or shuffling as many precise numbers. This idea connects modern AI to earlier “Ising” and Boltzmann machines, where such probabilistic units were already known to be efficient for optimization and sampling problems.

Using Many Quick Guesses Instead of One Heavy Answer
The study asks a simple but practical question: if we want better accuracy, is it cheaper to add more digital precision to each neuron, or to keep neurons extremely simple and instead draw multiple samples from them? The authors build a general energy formula that breaks down the cost of one elementary operation in a neural network into four parts: reading weights from memory, reading and writing activations, combining inputs (the synapse), and applying the nonlinearity (the neuron). Importantly, weights can be read once and then reused to generate several samples, so the dominant cost—accessing memory—can be spread over many runs. That means ten samples are far less than ten times as expensive as one.
Testing Probabilistic Networks on Images
To see whether this tradeoff pays off in practice, the researchers test probabilistic deep neural networks (p-DNNs) on both image classification (CIFAR-10) and image generation (faces from the CelebA and digits from MNIST). They replace standard multi-bit activation signals with single-bit p-bits, and train the networks in a “sample-aware” way, where the loss function is computed from the average of several stochastic forward passes. For classification, they find that even with 1-bit activations, just one sample can match the accuracy of a full-precision model, and two samples outperform it. With more samples, 1-bit p-DNNs approach the accuracy of 3-bit deterministic networks. For image generation, naive replacement of activations with p-bits produces noisy images, but retraining with the true stochastic elements and carefully handling the final layer produces face images whose quality nearly matches the 32-bit baseline, as measured by a standard distance metric.
Energy Costs and Real Hardware
The authors go beyond simulations and examine energy on real hardware. Using data from a 65 nm chip built for probabilistic circuits and additional circuit simulations, they show that large modern AI workloads are dominated by memory energy, not arithmetic. Because p-DNNs dramatically simplify the main compute step—from full multiply-and-accumulate to simple additions with 1-bit activations—the extra compute needed to take a handful of samples barely changes the total energy when weights live in power-hungry external memory. They validate these predictions on an FPGA implementation of an image-generating network: the probabilistic version reduces overall energy per inference by about 2.5 times compared with a standard design, while producing comparable digit images. The overhead of random-number generation and comparisons is tiny relative to memory and basic arithmetic.

Why Adjustable Sampling Matters
A distinctive benefit of probabilistic networks is that accuracy can be tuned at run time by changing the number of samples. A single 1-bit p-DNN engine can behave like a 1-, 2-, or 3-bit quantized model depending on how many samples it takes, without redesigning the hardware. This flexibility is especially attractive for large language models, where weight precision is already being pushed down to a few bits, but activation precision is harder to reduce without hurting quality. The framework in this paper shows how to estimate, for any such model, whether drawing extra samples is worth the energy compared with increasing bit widths.
A New Path to Efficient, Flexible AI
In plain terms, the paper demonstrates that “noisy” neural units can be harnessed rather than avoided. By treating each forward pass as a cheap, approximate guess and then averaging a small number of these guesses, networks can reach near–full-precision performance with drastically simpler computations and modest energy overhead. Because memory dominates the power bill, the cost of extra sampling is small, especially when weights are read once and reused. This suggests a promising route to AI hardware that is not only more energy efficient, but also adaptable on the fly—dialing sampling up or down to trade accuracy for battery life or speed as needed.
Citation: Ghantasala, L.A., Li, MC., Jaiswal, R. et al. Improving deep neural network performance through sampling. npj Unconv. Comput. 3, 18 (2026). https://doi.org/10.1038/s44335-026-00063-7
Keywords: probabilistic neural networks, energy-efficient AI, sampling-based inference, low-precision computing, deep learning hardware