AlphaFold: When Machine Learning meets Proteins

Silvia Illa

By Silvia Illa – Oct. 26, 2021

It’s been almost a year since DeepMind’s AlphaFold algorithm won the CASP14 (Critical Assessment of Protein Structure) protein-structure-prediction competition. AlphaFold surprised the world in general, and the scientific community in particular, with the high accuracy of its predictions in the competition; 95% of residues had a median alpha carbon RMSD of 0.96 Å compared to the experimental models, while the next best algorithm had an RMSD of 2.83 Å [1].

Figure 1. In blue: Chain A of SARS-CoV-2 Nucleocapsid dimerization domain (6WZQ), obtained by X-ray diffraction with a resolution of 1.45Å: In yellow: Model generated with AlphaFold of the same structure. RMSD = 0.902. The model was generated with AlphaFold implementation on Google Colab's Notebook [2].

AlphaFold is an algorithm that predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologues as inputs [1]. The algorithm has been specifically designed for this task and employs novel neural network architectures and training procedures that embed evolutionary, physical, and geometric aspects of protein structures, according to their authors. The network has two main stages, the Evoformer, and the Structure Model. The first block processes the input and outputs of an MSA (multiple sequence alignment) and a matrix of residues. Then, the second block gives the residues the 3D structure positions [1].

But what is the real potential of this algorithm in practice? Along this year, many things have been said about it. Obviously, there is a lot of justified excitement (even hype), but we have also discovered some of the limitations that are inherent to any method.

The excitement comes fundamentally from the fact that AlphaFold has made a breakthrough in the so-called “protein-folding problem”, which was introduced in 1972. This has a big impact in the field of molecular structural biology research, and thus, by extension, in the field of drug discovery. For example, the availability of large number of predicted protein models can be used as a starting point for experimental structure determination of novel proteins, or for modelling large macromolecular complexes using methods like molecular dynamics. Another example is that AlphaFold can help in the study of complex biological systems, like DNA/RNA or protein-ligand complexes. In addition, AlphaFold has the potential to shift the focus of study from structural determination to other mechanistic and functional aspects of the protein structures, like ligand screens or the investigation of dynamics. [3-4]

There are also some limitations to the method. Proteins are dynamic systems, which means that they can have multiple conformations depending on the environment. AlphaFold does not take this fact into account. Also, the algorithm does not provide any information about the folding pathway. In addition, the algorithm is expected to produce low confidence predictions in regions that are disordered or unstructured in isolation. Finally, the algorithm does not work with non-protein components like ligands, and thus it is not able to predict if a protein is in its active or inactive conformation [5].

Nevertheless, these limitations seem to be temporary, at least some of them. In just one year since its publication, the way opened by AlphaFold has already been followed by others, tackling some of its original weaknesses. Some authors have fine-tuned the algorithm for protein-peptide interactions, obtaining promising results [5], and other deep learning-based algorithms like RoseTTAFold have arisen, which are able to predict protein-protein interactions [6].

Overall, we believe AlphaFold will mean a big leap forward for drug discovery, as it has boosted the protein structure prediction, in an “open source” way [7]. This is very likely to cause an exponential increase in the availability of reliable protein structural data in the short term. Pharmacelera’s team is currently exploring ways of exploiting this information to complement and improve our services and technologies, PharmScreen and PharmQSAR.

AlphaFold constitutes just another example of how Machine Learning is changing the rules of the game in life sciences in general. Pharmacelera will continue to follow its evolution, as we continue to implement new Machine Learning approaches in the description and discovery of potential new drugs.

And you, have you had the chance to try AlphaFold yourself? What of its weaknesses do you think will be the most difficult to overcome? Let us know!

References

[1] Jumper J., Evans R., Pritzel A., et al., (2021), Highly accurate protein structure prediction with AlphaFold, Nature 596, 583-589.

[2] colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb

[3] Skolnick J., Gao M., et al. (2021), AlphaFold 2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function, J. Chem. Inf. Model.

[4] Tong, A.B., Burch, J.D., McKay, D. et al. (2021) Could AlphaFold revolutionize chemical therapeutics?. Nat Struct Mol Biol 28, 771–772

[5] Ko J., Lee J., (July 2001), Can alphafold2 predict protein-peptide complex
structures accurately?,  bioRxiv 453972

[6] Mullard A., (2021), What does AlphaFold mean for drug discovery?, Nature, 583-589

[7] github.com/deepmind/alphafold

Contact

CONTACT INFORMATION

HEADQUARTERS

Torre R, 4a planta, Despatx A05, Parc Científic de Barcelona (PCB). C/ Baldiri Reixac 4-8 08028 Barcelona