Clear Sky Science · en

VISDB 2.0: A manually curated resource of viral integration sites and their regulatory maps in human diseases

· Back to index

Why viruses rewriting our DNA matters

Many common viruses don’t just infect our cells and leave—they can stitch pieces of their own genetic material into our DNA. These hidden “edits” can help viruses persist for years and sometimes tip cells toward cancer. Until now, information about where in the human genome viruses integrate, and what those spots do, has been scattered across hundreds of studies. This article introduces VISDB 2.0, a greatly expanded online resource that brings these findings together, helping scientists trace how viral DNA insertions may drive disease and reveal new drug targets.

Figure 1
Figure 1.

How viruses leave a lasting genetic footprint

Viruses such as hepatitis B virus, human papillomavirus, Epstein–Barr virus, HIV, and others can integrate their genetic material directly into human chromosomes. For some DNA viruses, this happens as part of their life cycle; for certain RNA viruses called retroviruses, the viral RNA is first converted into DNA before joining the host genome. These insertions are not random. They often land near important switches that control when human genes turn on or off, including regions that regulate cell growth and immune responses. When viral DNA disrupts these control panels, it can change the behavior of cells, promote chronic infection, or contribute to the development of cancers.

Building a complete map from scattered clues

Over the past two decades, scientists have developed many experimental and computational tools to detect where viral DNA integrates—ranging from focused laboratory tests to whole-genome sequencing and deep-learning models. However, the resulting data have been spread across many papers and a handful of partial databases, making it hard to see the full picture. VISDB 2.0 tackles this problem through large-scale manual curation of 209 peer-reviewed studies published between 2020 and 2025, plus earlier resources. The authors checked each reported integration site, ensured that its coordinates fit the current human genome reference, removed duplicates, and kept only precisely mapped events. The result is a standardized catalog of 270,470 high-confidence viral integration sites across 11 medically important viruses and 45 human diseases.

Linking viral footprints to gene control and disease

VISDB 2.0 does more than just list where viruses integrate; it carefully describes what is happening around each site in the human genome. The database records whether an insertion falls inside or near genes, including known cancer-driving genes and tumor suppressors, and whether it sits close to dense DNA regions called CpG islands, fragile chromosome regions, or repeated DNA sequences that are prone to breakage. It also overlays rich regulatory and epigenetic information: integration sites are checked for overlap with gene promoters, enhancers, transcription factor binding sites, accessible chromatin, characteristic histone marks, and disease-linked genetic variants. By systematically comparing real integration sites to random locations across the genome, the authors show that viruses preferentially target specific regulatory neighborhoods instead of landing by chance.

From sequence patterns to treatment possibilities

To understand why certain spots are favored, the team examined the local sequence around each integration site, scanning short stretches of DNA for recurring patterns that resemble binding sites for human regulatory proteins. This reveals potential “landing motifs” that might guide viral integration or shape how nearby genes are controlled afterward. VISDB 2.0 also connects viral integration to noncoding RNAs—small and long RNA molecules that do not make proteins but strongly influence infection, immune responses, and cancer. By matching integration sites to known noncoding RNAs and their target genes, the database highlights pathways that may be re-wired by viral activity. Finally, the authors map thousands of drugs from DrugBank to genes affected by viral insertions, assembling a network of potential treatment and repurposing opportunities grounded in real viral–host interactions.

Figure 2
Figure 2.

A new starting point for studying virus–human interactions

In everyday terms, VISDB 2.0 is like an upgraded atlas that shows not only where viruses have carved their mark into human DNA, but also what neighborhoods those marks reside in, which residents (genes) live there, and which medicines may influence them. The data are freely available for bulk download and through a web interface that allows users to search by virus, gene, genome region, or disease, and to visualize integration patterns and their regulatory surroundings. By unifying scattered findings into a coherent, quality-checked resource, VISDB 2.0 gives researchers a powerful foundation for discovering how viral infections contribute to cancer and other diseases—and for turning that knowledge into better diagnostics and targeted therapies.

Citation: Citu, C., Singh, A., Liu, X. et al. VISDB 2.0: A manually curated resource of viral integration sites and their regulatory maps in human diseases. Sci Data 13, 695 (2026). https://doi.org/10.1038/s41597-026-07069-7

Keywords: viral integration, human genome, cancer genomics, regulatory DNA, bioinformatics database