Clear Sky Science · en
Integrating social cognitive theory with machine learning to predict MSM-women sexual behavior: a multicenter random forest model development study in China
Why hidden sexual bridges matter
Public health experts worry about "bridge" behaviors that quietly connect groups with high HIV risk to the broader population. In China, some men who have sex with men (MSM) also have sex with women, often while keeping their same‑sex behavior secret. This pattern can unintentionally expose female partners to infection and makes prevention much harder. The study summarized here asks a practical question: can we use insights from psychology together with modern data science to spot this hidden behavior early, in a way that supports people rather than blames them?

A closer look at a hard‑to‑reach community
The researchers worked with community organizations in six Chinese cities to anonymously survey 2,403 men who had sex with men in the previous six months. They asked not only about sexual contacts with men and women, but also about mood, self‑esteem, substance use, relationships, work, education, and living situation. About 17% of participants reported sex with a woman in the last half‑year. Most were young adults, highly educated, and many had moved away from their hometowns. This community‑based approach allowed the team to reach people who might otherwise avoid official surveys because of stigma or fear of being identified.
How psychology and algorithms were combined
The study was guided by Social Cognitive Theory, a framework that views behavior as the product of continuous interaction between personal thoughts and feelings, everyday actions, and the surrounding social world. Using this lens, the team grouped 28 measured factors into three broad areas: personal state (such as depression, anxiety, and self‑esteem), behavior (such as group sex with men or using drugs before sex), and environment (such as education level, marital status, and migration). Instead of letting a computer blindly search through every pattern, the authors first chose variables that theory suggests should matter, then used a machine‑learning method known as random forests to rank which ones actually helped most in predicting sex with women.
Building a compact risk score
From the original 28 measures, the algorithm identified a compact set of nine that carried most of the predictive power: anxiety, depression, self‑esteem, age, education level, marital status, sexual orientation, recent group sex with men, and drug use before sex. These nine factors were then fed into a simpler statistical model that outputs a probability that a given man has recently had sex with a woman. Using repeated training and testing on different subsets of the data, the model was able to tell MSM who did and did not report sex with women apart with reasonably high accuracy: about 80% on a standard performance scale. It also produced risk estimates that matched observed frequencies well, meaning the predicted probabilities were not systematically too high or too low.

What the model reveals about risk patterns
The strongest signals came from marital status and how participants labeled their sexual orientation, followed by psychological distress and certain behaviors. Men who were married, or who identified as homosexual or bisexual rather than unsure, were more likely to report sex with women. Higher scores for anxiety and depression and lower self‑esteem were also associated with greater likelihood of cross‑gender sex, as were recent group sex with men and using drugs before sex. Younger age and lower education tended to increase risk. Importantly, the model performed similarly well across different ages, education levels, marital situations, and between migrants and local residents, suggesting that the risk score is not limited to a narrow subgroup.
Turning numbers into a practical, non‑blaming tool
To make the results usable outside a statistics lab, the team converted the nine key predictors into a simple scoring chart, or nomogram. A counselor, clinician, or outreach worker can use this chart to assign points for each person’s mood scores, relationship status, education, recent behaviors, and so on; the total points translate into an estimated chance that the person is also having sex with women. The authors emphasize that this tool is designed for confidential, supportive conversations and early prevention—helping target counseling, testing, and safer‑sex resources to those who might serve as hidden bridges—rather than to label individuals or increase stigma.
Citation: Liu, S., Gao, Y., Xu, H. et al. Integrating social cognitive theory with machine learning to predict MSM-women sexual behavior: a multicenter random forest model development study in China. Sci Rep 16, 6029 (2026). https://doi.org/10.1038/s41598-026-36202-0
Keywords: HIV prevention, bisexual behavior, machine learning, mental health, China MSM