Submitted by Soupjoe5 t3_ykj6sc in Futurology
Soupjoe5 OP t1_iutgy1o wrote
Reply to comment by Soupjoe5 in AlphaFold’s new rival? Meta AI predicts shape of 600 million proteins by Soupjoe5
2
Meta’s network, called ESMFold, isn’t quite as accurate as AlphaFold, Rives’ team reported earlier this summer2, but it is about 60 times faster at predicting structures, he says. “What this means is that we can scale structure prediction to much larger databases.”
As a test case, they decided to wield their model on a database of bulk-sequenced ‘metagenomic’ DNA from environmental sources including soil, seawater, the human gut, skin and other microbial habitats. The vast majority of the DNA entries — which encode potential proteins — come from organisms that have never been cultured and are unknown to science.
In total, the Meta team predicted the structures of more than 617 million proteins. The effort took just 2 weeks (AlphaFold can take minutes to generate a single prediction). The predictions are freely available for anyone to use, as is the code underlying the model, says Rives.
Of these 617 million predictions, the model deemed more than one-third to be high quality, such that researchers can have confidence that the overall protein shape is correct and, in some cases, can discern finer atomic-level details. Millions of these structures are entirely novel, and unlike anything in databases of protein structures determined experimentally or in the AlphaFold database of predictions from known organisms.
A good chunk of the AlphaFold database is made of structures that are nearly identical to each other, and ‘metagenomic’ databases “should cover a large part of the previously unseen protein universe”, says Martin Steinegger, a computational biologist at Seoul National University. “There’s a big opportunity now to unravel more of the darkness.”
Sergey Ovchinnikov, an evolutionary biologist at Harvard University in Cambridge, Massachusetts, wonders about the hundreds of millions of predictions that ESMFold made with low-confidence. Some might lack a defined structure, at least in isolation, whereas others might be non-coding DNA mistaken as a protein-coding material. “It seems there is still more than half of protein space we know nothing about,” he says.
Soupjoe5 OP t1_iutgymb wrote
3
Leaner, simpler, cheaper
Burkhard Rost, a computational biologist at the Technical University of Munich in Germany, is impressed with the combination of speed and accuracy of Meta’s model. But he questions whether it really offers an advantage over AlphaFold’s precision, when it comes to predicting proteins from metagenomic databases. Language model-based prediction methods — including one developed by his team3 — are better suited to quickly determine how mutations alter protein structure, which is not possible with AlphaFold. “We will see structure prediction become leaner, simpler cheaper and that will open the door for new things,” he says.
DeepMind doesn’t currently have plans to include metagenomic structure predictions in its database, but hasn’t ruled this out for future releases, according to a company representative. But Steinegger and his collaborators have used a version of AlphaFold to predict the structures of some 30 million metagenomic proteins. They are hoping to find new kinds of RNA viruses by looking for novel forms of their genome-copying enzymes.
Steinegger sees trawling biology’s dark matter as obvious next step for such tools. “I do think we will quite soon have an explosion in the analysis of these metagenomic structures.”
Viewing a single comment thread. View all comments