Background
AlphaFold is an AI system developed by Google DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment.
Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make these predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of UniProt (the standard repository of protein sequences and annotations). We provide individual downloads for the human proteome and for the proteomes of 47 other key organisms important in research and global health. We also provide a download for the manually curated subset of UniProt (Swiss-Prot).
In CASP14, AlphaFold was the top-ranked protein structure prediction method by a large margin, producing predictions with high accuracy. While the system still has some limitations, the CASP results suggest AlphaFold has immediate potential to help us understand the structure of proteins and advance biological research.
Let us know how the AlphaFold Protein Structure Database has been useful in your research, or if you have questions not answered in the FAQs, at alphafold@deepmind.com.
If your use case isn’t covered by the database, you can generate your own AlphaFold predictions using Google DeepMind’s Colab notebook or open source code. Both resources also support multimer prediction.
What’s new?
Integration of Foldseek search - September 2024
Foldseek is now readily available within the AFDB; users can efficiently search protein structures of interest against the vast AFDB50 and PDB collections. The integration provides a seamless and user-friendly experience, allowing for smooth navigation between sequence and structural data, empowering researchers to gain a deeper understanding of protein architecture and its implications for biological function.
Users can efficiently view and sort search results by criteria such as hit significance i.e. E-value and sequence identity. Filtering by taxonomy is also available for more focused results. Once discovered and organised, the results can be downloaded for offline use.
What’s next?
We plan to continue updating the database with structures for newly discovered protein sequences, and to improve features and functionality in response to user feedback. Please follow Google DeepMind's and EMBL-EBI’s social channels for updates.
Licence and attribution
All of the data provided is freely available for both academic and commercial use under Creative Commons Attribution 4.0 (CC-BY 4.0) licence terms.
If you use this resource, please cite the following papers:
Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021).
Varadi, M et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Research (2024).
If you use data from AlphaMissense in your work, please cite the following paper:
Cheng, J et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (2023).
The structures and data provided in this resource are predictions with varying levels of confidence and should be interpreted carefully. The information is for theoretical modelling only. It is not intended, validated or approved for any clinical use.
FAQs
How does AlphaFold work?
DeepMind’s 2021 methods paper is the best reference for this. It gives an overview of the most important ideas, and there is a detailed description of all aspects of the system in the Supplementary Information. Visit our online training course to learn more about AlphaFold.
Note that the architecture of the system used at CASP14 differs significantly from the version used at CASP13, making it important to refer to the 2021 publication.
What is AlphaMissense?
AlphaMissense is an AI model that builds on Google DeepMind’s AlphaFold2 to categorise ‘missense’ mutations in different proteins as either ‘likely pathogenic’, ‘likely benign’ or ‘uncertain’, producing a score that estimates the likelihood of a variant being pathogenic. AlphaMissense leverages AlphaFold2’s capability to model protein structure, and its capacity to learn evolutionary constraints from related sequences. The implementation is closely aligned with AlphaFold2, with some architectural differences. AlphaMissense was used to classify the effects of all possible 216 million single amino acid sequence substitutions across the 19,233 canonical human proteins. Using an amino acid sequence as an input, AlphaMissense: Note that AlphaMissense does not predict the change in protein structure, or biophysical properties such as stability, upon mutation. Instead, it uses related protein sequences and protein structure as contextual information to estimate pathogenicity. For more information about AlphaMissense, please refer to the paper: Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (2023). AlphaMissense scores for all human missense variants are available on the Google Cloud Public Dataset.
What if I can’t find the protein I’m interested in?
If you can’t find the structure you’re looking for, here are some suggestions to improve your search results:
The AlphaFold source code and Colab notebook can be used to predict the structures of proteins not in AlphaFold DB. Both resources have been updated to support predicting multimer structures.
If you experience any issues with search, please contact afdbhelp@ebi.ac.uk.
How can I download and use the Predicted Aligned Error (PAE) file?
The PAE is displayed as an image for each of the structure predictions. If you need the raw data with PAE for all residue pairs, you can download the PAE as a JSON file using the button at the top of the structure page. This file is in a custom format and it isn't supported by any existing software – you will have to use Python or another programming language to analyse or plot the information that is contained in it. The fields in the JSON file are: We updated the PAE JSON file format on 28th July 2022 to reduce file size by 4x. Please ensure you read the 2D matrix of PAE values from the predicted_aligned_error field instead of the removed 1D "distances" field and avoid using the old "residue1" and "residue2" fields. If you are using a script or third party tool to read the PAE JSON file programmatically and you are seeing errors (e.g. missing field "distance"), check with the author of the program whether the latest PAE JSON format is supported.
[
{
"predicted_aligned_error": [[0, 1, 4, 7, 9, ...], ...], # Shape: (num_res, num_res).
"max_predicted_aligned_error": 31.75
}
]
Who should I contact with enquiries?
For questions and feedback about the AlphaFold DB website, please contact afdbhelp@ebi.ac.uk. For sharing feedback on structure predictions or for questions about AlphaFold not directly related to the database, please contact the AlphaFold team at alphafold@deepmind.com. We may not be able to respond to every query and there may be some delay before we can get back to you. For other questions about AlphaFold not directly related to the database, please contact the AlphaFold team at alphafold@deepmind.com. Please do not share anything confidential with Google DeepMind. For press enquiries, please contact press@deepmind.com or comms@ebi.ac.uk.
EMBL-EBI training
Recorded webinar
Accessing and interpreting predicted protein structures from AlphaFold database
Online tutorial