Clear Sky Science · en
Probabilities of two alleles being identity by state at unobserved loci predicted by observed loci in cattle populations
Why cattle family trees are no longer enough
Modern cattle breeding relies on choosing the right parents to produce healthy, productive animals. For more than a century, breeders have used family trees, or pedigrees, to avoid close inbreeding that can harm fertility, growth, and disease resistance. But pedigrees are often incomplete or contain mistakes, and they only estimate how similar animals might be. This study asks a simple but important question: if we look directly at DNA instead of paper records, can we better see which animals are truly genetically alike, even in parts of the genome we have not measured?
Looking for hidden genetic twins in the genome
The researchers focused on a concept called "identity by state" (IBS). Two DNA letters at the same position are IBS if they look exactly the same, regardless of whether they came from a recent common ancestor. In practice, breeders only genotype animals at a subset of DNA markers called SNPs, leaving many positions unobserved. The team wanted to know how well different methods, based on observed SNPs, could predict the chance that animals share matching alleles at these unobserved sites—essentially, how well we can see the hidden genetic similarity in the genome.

Simulated herds and real cattle data
To test this, the authors used two kinds of data. First, they simulated cattle populations over many generations, controlling factors such as the effective population size (how many animals effectively contribute genes) and whether selection of parents was random or based on estimated breeding values for a trait. They created large sets of SNPs and then split them into “observed” markers and “unobserved” markers. The unobserved set provided the reference values: the true probabilities of matching alleles across the genome. Second, they repeated the analyses with real high-density genotypes from Japanese Black cattle, a major beef breed, using a subset of SNPs as observed markers and another subset as unobserved reference points.
Comparing pedigree scores with DNA-based measures
The study evaluated many different DNA-based measures of inbreeding within animals and genetic relatedness between animals. Some methods looked at each SNP independently, while others grouped nearby SNPs into longer stretches of identical DNA called runs of homozygosity or modeled segments inherited from a common ancestor. For each measure, the team calculated how strongly its predictions matched the reference IBS values at unobserved sites, using correlation as a measure of accuracy. They also compared these DNA-based measures with traditional pedigree-based inbreeding and relationship coefficients, which are widely used in breeding programs.

DNA markers clearly outperform pedigrees
Across both simulated and real cattle populations, genome-based measures consistently outperformed pedigree-based measures in predicting hidden IBS. In particular, methods that treated every SNP as if both alleles started at a frequency of 0.5 in an ancestral population—known in the paper as FGRMV2 and fGRMV2—showed very high accuracy. So did measures based on long homozygous segments, especially those that either modeled segments inherited from a common ancestor (FHBD) or counted relatively short runs of homozygosity across the whole genome (FROH4all and its between-animal counterpart fSEG4). These top-performing measures remained accurate even when selection pressures were applied over many generations, and they tracked rising inbreeding more reliably than pedigree-based estimates.
What this means for breeders and food security
For a non-specialist, the takeaway is that looking directly at DNA gives a much clearer picture of how genetically similar cattle really are than relying on family trees alone. By using particular genome-based indicators, breeders can better monitor hidden inbreeding, protect genetic diversity, and design matings that balance genetic progress with long-term herd health. This matters not only for avoiding inbreeding depression today but also for keeping enough genetic variety to adapt cattle to future challenges, such as new diseases or a changing climate.
Citation: Nagai, R., Honda, T., Satoh, M. et al. Probabilities of two alleles being identity by state at unobserved loci predicted by observed loci in cattle populations. Sci Rep 16, 7454 (2026). https://doi.org/10.1038/s41598-026-37530-x
Keywords: cattle genetics, inbreeding, genomic selection, genetic diversity, SNP markers