Protein Data Bank Marks 50 Years Helping to Unlock Mysteries of Human Disease

HIV enzyme
Structures of HIV protease (turquoise, PDB ID 3pj6) have been used to design powerful drugs for HIV therapy. Illustrator: Maria Voigt, RCSB Protein Data Bank

Stephen K. Burley believes that if the COVID-19 global pandemic has taught society anything it is that sharing scientific information is key to saving precious time, avoiding duplication of effort, and accelerating the research needed to discover and develop new life-saving drugs and vaccines.

That didn’t happen 20 years ago when Burley, a clinician-scientist who at that time was head of research at SGX Pharmaceuticals, Inc., a cancer-focused biotechnology company located in California. Burley’s company deposited the first three-dimensional (3D) structure of a severe acute respiratory syndrome (SARS) coronavirus protein into the Protein Data Bank, the global open access biostructure data resource containing more than 180,000 structures used worldwide by researchers to unlock the mysteries of human disease.

As chair of the scientific advisory board of the RCSB Protein Data Bank (PDB) at Rutgers University-New Brunswick in 2003, a decade before joining the faculty at Rutgers, Burley hoped data sharing by SGX Pharmaceuticals would help in the fight against the SARS epidemic.

He thought others would follow the SGX lead, share data and work together to discover drugs to treat the respiratory illness that emerged in China and spread to four other countries, including the United States, killing more than 800 individuals before it was brought under control. That did not happen.

He continued to believe the same when the Middle East respiratory syndrome (MERS) epidemic hit a decade later. Again, that did not happen.

“Our decision back in 2003 surprised competitors, because SGX structure data were made available without usage restrictions or royalty obligations,” said Burley, now University Professor and Henry Rutgers Chair at Rutgers-New Brunswick and director of the RCSB PDB and the Rutgers Institute for Quantitative Biomedicine. “I was confident that drug companies would build on the open-access data and produce anti-SARS drugs. Not one was forthcoming.”

As the Worldwide Protein Data Bank celebrates its 50th anniversary Burley’s hopes have been realized. Scientists around the globe are working together using 3D protein structure information stored in the PDB to discover and develop vaccines and drugs that will protect the world’s population, not only against the current pandemic but also future coronavirus outbreaks.

Stephen K. Burley
Stephen K. Burley believes that if the COVID-19 global pandemic has taught society anything it is that sharing scientific information is key to saving precious time, avoiding duplication of effort, and accelerating the research needed to discover and develop new life-saving drugs and vaccines.
Nick Romanenko/Rutgers University

Managed by the Worldwide Protein Data Bank partnership, with data centers in the United States, Europe, and Asia, the PDB was cofounded by Helen M. Berman, Board of Governors Distinguished Professor Emerita of Chemistry and Chemical Biology at Rutgers-New Brunswick, in 1971 with only seven protein structures. Berman, who brought the PDB to Rutgers in 1998, led the organization until 2014. More than $5 billion in funding has been provided by the NIH to structural biologists in the U.S. who have generated more than 50,000 of the structures currently available from the PDB.

“The scientific landscape is much different than it was two or three decades ago, when sharing data in a competitive research environment was not considered as critical to scientific discovery and overall public health,” Burley said.

Within months of COVID-19 cases first appearing in late 2019, scientists based in Shanghai, China, deposited the first 3D structure of a crucial viral protein into the PDB. Today, more than 1,500 SARS-CoV-2 protein structures reside in the PDB, where they are made freely available to researchers, educators, and clinicians around the world.

“Public-private partnerships are happening more frequently now,” said Burley. “And the U.S. government has increased funding for research and technology development efforts aimed at combating future viral pandemics. Both developments are very welcome moves.”

Helen Berman
The Protein Data Bank was cofounded by Helen M. Berman, Board of Governors Distinguished Professor Emerita of Chemistry and Chemical Biology at Rutgers-New Brunswick, in 1971 with only seven protein structures. Berman brought the PDB to Rutgers in 1998.
Nick Romanenko/Rutgers University

Biomedical researchers using the structure data stored in the PDB have published more than two million scientific papers, some of which have helped researchers and pharmaceutical companies tackle major health challenges, including heart disease, cancer, diabetes, Alzheimer’s disease, and HIV-AIDS. 

“Open access to 3D structure information from the PDB facilitated discovery and development of more than 90 percent of the 210 newly approved drugs by the United States Food and Drug Administration (FDA) in 2010-2016,” Burley said. “Looking more closely at the 54 new anti-cancer drugs approved by the FDA in 2010-2018, revealed that more than 70 percent of them were the products of structure-guided drug discovery accelerated by open access to PDB structures of the drug targets.”

Last month, scientists from the National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health, discovered how a key protein on the surface of the hepatitis (HCV) virus might bind to a receptor protein found on some human cells. They are hoping that this information will provide insights to a much-needed vaccine for the chronic bloodborne virus that has infected about 2.4 million in the U.S.

Joseph Marcotrigiano, a senior investigator in NIAID’s Laboratory of Infectious Disease, and lead author of the new study, began this hepatitis C research while an associate professor of chemistry and chemical biology at Rutgers using information from a 3D hepatitis C structure available through the PDB.

He and others credit Berman for her commitment 50 years ago to help create a database of protein structures that is now considered to be one of the most important open-access biodata resources available to scientists across the world.

“The advantage of the PDB is that once a new structural study has been published, the data are immediately made available to anyone in the world,” Marcotrigiano said. “And it goes both ways. Sometimes you need to use information from the PDB and other times you are giving over findings for others to use freely.”

Ann Stock, Distinguished Professor in the Department of Biochemistry and Molecular Biology at Robert Wood Johnson Medical School and associate director of the Center for Advanced Biotechnology and Medicine (CABM), said the data shared through the PDB is central to understanding biological systems at the molecular level – an integral part of drug development being done to treat human diseases by both biotechnology and pharmaceutical companies.

“While some investigators wanted to keep information to themselves to guide their own investigations in the early days of structural biology,” Stock said, “the PDB enabled data sharing and had support of the government and the academic scientific community, who understood that this information was critical to researchers throughout the world.”

Today, this means that structural biology researchers who want to publish in peer-reviewed scientific journals must share their data via the PDB.

“Sharing of scientific data is something that has evolved with a lot of progress over the last couple of decades,” Stock said. “The PDB was one of the first databases that provided a comprehensive set of data for a particular field and set policies early on about what needed to be shared.”