Clear Sky Science · en
PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT Dataset with Corresponding Radiology Reports
Why this new cancer imaging resource matters
Cancer doctors increasingly rely on advanced scans and computer tools to see how tumors behave throughout the body. But powerful artificial intelligence systems need huge, carefully organized collections of real patient scans to learn from, and those are surprisingly rare and hard to share safely. This article introduces PETWB-REP, a new public collection of whole-body cancer scans and matching doctor reports that aims to accelerate better diagnosis tools and more precise research around the world.

A window into the whole body
The PETWB-REP project centers on a type of scan called FDG PET/CT, which combines two views of the body at once. The CT part shows detailed anatomy, like bones and organs, while the PET part lights up areas that are using a lot of sugar, often a sign of active cancer. By fusing these images, doctors can see not just where tumors sit, but how active they are. The new dataset gathers whole-body scans from 490 people with many different cancers, including lung, liver, breast, prostate, ovarian, and others, making it much broader than many earlier collections that focused on a single tumor type.
From clinic visit to research-ready data
All of the scans were collected at a large imaging center in Shanghai between 2021 and 2024 during routine care, under oversight from an ethics committee. Patients fasted before their scans, received a carefully measured injection of a radioactive sugar, and then rested to allow the tracer to spread through the body. Each scan covered the body from the base of the skull to the mid-thigh, following a standardized protocol so that images could be compared across patients. In addition to the pictures themselves, the team recorded basic information such as age, sex, cancer type, and details of how the scans were performed, and stored everything in a consistent structure designed for sharing medical images.
Protecting privacy while keeping detail
Turning clinical scans into a safe public resource required a careful process of stripping away personal information while keeping medically useful detail. The researchers first erased names, IDs, and other identifiers from the image files and replaced them with study codes. They then used a specialized tool to digitally remove facial features from the CT images so that patients could not be recognized, while leaving the neck and body anatomy intact for analysis. Two researchers manually checked the scans and text to be sure nothing identifiable remained. The result is a set of images and reports that preserve tumor patterns and organ structure but no longer reveal who the patients are.
Bridging pictures and words
One distinctive feature of PETWB-REP is that each scan comes with a full radiology report written by experienced nuclear medicine doctors. These reports describe what the doctors saw in different regions of the body, note the size and behavior of suspicious spots, and end with an overall impression. To open the dataset to a global audience, the original Chinese reports were translated into English using machine translation and then carefully corrected by a bilingual specialist, with both languages released side by side. This rich pairing of pictures and narrative makes the dataset ideal for training computer systems that can link patterns in images to the way doctors describe and interpret them.

How researchers can use this resource
The final dataset is organized into "raw" scans and processed versions that are easier for computers to handle. The team converted the data into a widely used research format, adjusted image brightness and contrast, aligned the PET and CT views, and created a master table summarizing each case. They also ran quality checks to ensure that every patient has matching scans and reports and that the images are free of major flaws. With this foundation, researchers can build and test tools to automatically find and outline tumors, combine image and text information to predict outcomes, or generate draft reports from scans. Although the data come from a single center and the mix of cancers reflects local practice, the size, variety, and careful preparation of PETWB-REP make it a valuable new starting point for both medical and artificial intelligence studies.
Citation: Xue, L., Feng, G., Zhang, W. et al. PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT Dataset with Corresponding Radiology Reports. Sci Data 13, 675 (2026). https://doi.org/10.1038/s41597-026-07058-w
Keywords: PET/CT imaging, multi-cancer dataset, radiology reports, medical AI, multimodal imaging