AlphaFold generates a 3D rendering of the protein universe

Predict the structure of an alpha-fold protein

AlphaFold predicts the structure of nearly every index protein known to science. Credit: Karen Arnott/EMBL-EBI

AI-powered predictions of the 3D structures of nearly all indexed proteins known to science have been made by DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI). catalog Freely and publicly available to the scientific community, via the AlphaFold Protein Structure Database.

The two organizations hope that the expanded database will continue to increase our understanding of biology, and assist countless scientists in their work as they strive to meet global challenges.

This major achievement indicates the database has been expanded nearly 200 times. It has grown from nearly a million protein structures to over 200 million, and now covers nearly every organism on Earth whose genome has been sequenced. Predicted structures for a wide range of species, including plants, bacteria, animals, and other organisms are now included in the expanded database. This opens new avenues for research across the life sciences that will have an impact on global challenges, including sustainability, food insecurity and neglected diseases.

Now, the predicted structure will be available for practically all protein sequences in the UniProt protein database. This release will also open up new research avenues, including support for bioinformatics and computational work by allowing scientists to identify patterns and trends in the database.

“AlphaFold now provides a 3D view of the protein universe,” said Edith Heard, General Manager of EMBL. “The popularity and growth of the AlphaFold database is a testament to the successful collaboration between DeepMind and EMBL. It shows us a glimpse into the power of interdisciplinary science.”

“We are amazed at the rate at which the AlphaFold has already become an essential tool for hundreds of thousands of scientists in laboratories and universities around the world,” said Demis Hassabis, founder and CEO of DeepMind. “From fighting disease to tackling plastic pollution, AlphaFold has already made an incredible impact on some of the biggest global challenges we face. We hope this expanded database will help countless scientists in their important work and open up entirely new frontiers of scientific discovery.”


Q8W3K0: A potential plant disease resistance protein. Credit: AlphaFold

An essential tool for scientists

DeepMind and EMBL-EBI launched the AlphaFold database in July 2021. At that time it contained more than 350,000 protein structure predictions, including the entire human protein. Subsequent updates saw the addition of UniProtKB/SwissProt and 27 new proteins, 17 of which represent neglected tropical diseases that continue to destroy the lives of more than a billion people globally.

Over 1,000 scientific papers have cited the database and more than 500,000 researchers from over 190 countries gained access to the AlphaFold database to view more than two million structures in just over one year.

The team has also seen researchers build on AlphaFold to create and adapt tools like Foldseek and Dali that allow users to search for similar entries for a specific protein. Others have adopted the fundamental machine learning ideas behind AlphaFold, to form the backbone of a list of new algorithms in this space, or apply them to areas such as predicting RNA structure or developing new models for designing proteins.

The impact and future of AlphaFold and the database

AlphaFold has also shown impact in areas such as improving our ability to combat plastic pollution, gaining insight into Parkinson’s disease, increasing honeybee health, understanding how ice forms, treating neglected diseases like Chagas disease and leishmaniasis, and exploring human evolution.

“We released AlphaFold in the hope that other teams could learn from and build on the progress we made, and it was exciting to see it happen so quickly. Several other AI research organizations are now entering this field and building on the progress made by AlphaFold to achieve even more breakthroughs. This is truly a new era in structural biology, and AI-based methods will lead to amazing advances, said John Jumper, Research Scientist and AlphaFold Lead at DeepMind.

“AlphaFold has sent ripples across the molecular biology community. In the past year alone, there have been over a thousand scientific articles on a wide range of research topics using AlphaFold structures; Samir Vilankar, team leader at the EMBL-EBI Protein Data Bank in Europe, said,” I’ve never seen anything like this before.” And this is just the effect of a million predictions; imagine the effect of having over 200 million publicly accessible protein structure predictions in the AlphaFold database.”

DeepMind and EMBL-EBI will continue to update the database periodically, with the goal of improving features and functionality in response to user feedback. Access to structures will remain fully open, under a CC-BY 4.0 license, and bulk downloads will be made available via public Google Cloud datasets.

Leave a Comment