Rutgers researchers create a machine-learning tool that analyzes metadata collected by Google and YouTube to predict user psychosocial distress

User data from Google, YouTube and other online platforms can be used to predict, prevent and even mitigate loneliness, potentially lowering the risk of suicide for at-risk individuals, according to a Rutgers study.

“Anxiety and loneliness are typically diagnosed at a doctor’s office,” said Vivek K. Singh, director of the Behavioral Informatics Lab at the Rutgers School of Communication and Information and the corresponding author of the study published in the journal Electronics.

“We wanted to see if data collected passively, by the websites people visit or the search terms they use, could be analyzed by machine learning to be clinically useful,” Singh said. “The goal was to see if there are clear connections between digital traces and wellness indicators.”

With loneliness reaching epidemic levels in the United States, efforts are underway to develop automated or low-cost methods to support people experiencing psychosocial distress. Singh said making better use of our online browsing history could help.  

To determine whether machine learning can accurately predict loneliness, Singh and colleagues from Rutgers’ School of Communication and Information (Eiman Ahmed, Liyang Xue, Haein Kong and Arcadio Matos), Department of Computer Science (Aniket Sanap), and School of Public Health (Vincent Silenzio) recruited 92 volunteers as part of the Rutgers Wellness Study, a 10-week survey of users’ online behaviors and mental health in early 2021, a period of extended COVID-19-related isolation. Each participant agreed to share metadata collected by Google and YouTube during searches, website visits and other online activities.

With anonymized versions of these “digital trace data,” researchers created computer models designed to identify online behaviors associated with clinical levels of loneliness. They also conducted mental health surveys with participants every week and at the beginning, middle and end of the study.

The machine-learning models were then directed to predict loneliness levels based on online use. Activities were parsed by category – such as “sports,” “music” and “education” – as well as by aggregates (“weekly number of YouTube sessions,” “weekly number of Google searches”).

The predictions produced by the models were then compared to the participants’ own mental health survey information.

The results were very promising, Singh said. Not only were the models effective at measuring loneliness, he said, but they shed light on which platforms were better predictors. For instance, “lonely” participants used Google Search more than “not lonely” participants and “not lonely” participants used YouTube more than “lonely” participants.

There are potential downsides to these types of algorithms, and in the wrong hands data collection technology can be misused, Singh said. But he added that the tools created for this study, which he hopes to develop further, give the highest importance to data security.

Personal identifiers from all data such as name, address and phone number were removed from the data before analysis. Informed consent and institutional review were part of the study design and HIPAA (a health privacy act) compliant tools were used for data storage and analysis. 

“With refinements, we believe the proposed approach could contribute toward digital health dashboards for individuals, wherein their data, combined with models running on their computers (e.g., as web plugins), could be used for triaging health, and provide support and guidance via awareness material or referrals,” the researchers wrote.

“With artificial intelligence and machine learning, we need to balance opportunities and risk. I would argue that preventing suicides and identifying loneliness are responsibilities we can’t ignore,” Singh said. “Mental health provisions in many communities are very thin; they’re even thinner when you consider things like insurance, stigma and wait times. The upsides of this kind of technology are, I would argue, much greater than the drawbacks, and responsible public-facing tools supporting mental health are a critical need of the hour."