The RCSB Protein Data Bank, headquartered at Rutgers University–New Brunswick, which has transformed biology and medicine by impacting research, education and drug discovery worldwide, has a lot more digital data storage space thanks to Amazon Web Services (AWS).

The new partnership with Amazon.com, Inc.’s cloud-computing arm and Open Data Sponsorship Program is providing the Protein Data Bank with more than 100 terabytes of storage for no-cost delivery of information to millions of scientists, educators and students throughout the world who are working in fundamental biology, biomedicine, bioenergy, bioengineering and biotechnology. The partnership with Amazon has more than doubled the data bank’s digital storage capacity at Rutgers.

“Since 1971, the global Protein Data Bank has enabled basic, translational and clinical research by providing open access to three-dimensional biostructure information at the atomic level,” said Stephen K. Burley, director of the RCSB Protein Data Bank and founding director of the Rutgers Institute for Quantitative Biomedicine, where the data bank is based. “Open access to Protein Data Bank information is central to accelerating scientific discoveries for the benefit of all humanity.”

The Protein Data Bank stores nearly 190,000 experimentally determined 3D structures of proteins, DNA and RNA that are freely available with no limitations on usage. The archive is jointly managed by the Worldwide Protein Data Bank partnership that involves data centers in the U.S., Europe and Asia. The U.S. data center is operated by the RCSB PDB at Rutgers, the San Diego Supercomputer Center at the University of California, San Diego and the University of California, San Francisco. The RCSB PDB has been operating the U.S. data center for the global Protein Data Bank for more than 20 years.

“The Protein Data Bank plays an important role in facilitating discovery and development of life-changing drugs,” said Burley, who co-leads the Cancer Pharmacology Research Program at Rutgers Cancer Institute of New Jersey. “Freely available 3D biostructure data constitute a public good with far-reaching impacts on patients and their families.”

Burley, a University Professor who holds a Henry Rutgers Chair, is an expert in structural biology, molecular biophysics, computational biology, data science, structure-guided/fragment-based drug discovery and clinical medicine/oncology and used the PDB resources even before he came to Rutgers.

Researchers using data stored in the Protein Data Bank have published more than 2 million scientific papers, some of which have helped researchers and pharmaceutical companies tackle major health challenges, including heart disease, cancer, diabetes, Alzheimer’s disease, HIV-AIDS and  COVID-19.

Under the new program, AWS is covering the cost of storage and extraction for publicly available, high-value cloud-optimized datasets. Working with data providers, AWS aims to:

  • Provide open access to data by making it available for analysis on AWS
  • Develop new cloud-native techniques, formats and tools that lower the cost of working with data
  • Encourage the development of communities that benefit from access to shared datasets

“Access to open data sets is improving the way the scientific community can collaborate and accelerate life-changing discoveries,” said Josh Weatherly, director of U.S. education, state and local government verticals at AWS. “The Protein Data Bank provides a vast and diverse repository for researchers in government, academia and industry to use to develop diagnostics, vaccines, drugs, and other therapeutic treatments.”