Hostname: page-component-848d4c4894-sjtt6 Total loading time: 0 Render date: 2024-07-06T15:32:06.242Z Has data issue: false hasContentIssue false

Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps

Published online by Cambridge University Press:  20 September 2022

Maytha Alshammari
Affiliation:
Department of Computer Science, Old Dominion University, Norfolk, VA, USA
Willy Wriggers*
Affiliation:
Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA, USA
Jiangwen Sun
Affiliation:
Department of Computer Science, Old Dominion University, Norfolk, VA, USA
Jing He
Affiliation:
Department of Computer Science, Old Dominion University, Norfolk, VA, USA
*
*Author for correspondence: Willy Wriggers, E-mail: wriggers@biomachina.org
Rights & Permissions [Opens in a new window]

Abstract

Recent breakthroughs in deep learning-based protein structure prediction show that it is possible to obtain highly accurate models for a wide range of difficult protein targets for which only the amino acid sequence is known. The availability of accurately predicted models from sequences can potentially revolutionise many modelling approaches in structural biology, including the interpretation of cryo-EM density maps. Although atomic structures can be readily solved from cryo-EM maps of better than 4 Å resolution, it is still challenging to determine accurate models from lower-resolution density maps. Here, we report on the benefits of models predicted by AlphaFold2 (the best-performing structure prediction method at CASP14) on cryo-EM refinement using the Phenix refinement suite for AlphaFold2 models. To study the robustness of model refinement at a lower resolution of interest, we introduced hybrid maps (i.e. experimental cryo-EM maps) filtered to lower resolutions by real-space convolution. The AlphaFold2 models were refined to attain good accuracies above 0.8 TM scores for 9 of the 13 cryo-EM maps. TM scores improved for AlphaFold2 models refined against all 13 cryo-EM maps of better than 4.5 Å resolution, 8 hybrid maps of 6 Å resolution, and 3 hybrid maps of 8 Å resolution. The results show that it is possible (at least with the Phenix protocol) to extend the refinement success below 4.5 Å resolution. We even found isolated cases in which resolution lowering was slightly beneficial for refinement, suggesting that high-resolution cryo-EM maps might sometimes trap AlphaFold2 models in local optima.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NoDerivatives licence (http://creativecommons.org/licenses/by-nd/4.0), which permits re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Introduction

The advancement in protein structure determination and protein structure prediction from amino acid sequences has made the two initially independent paths more interconnected. On the one hand, experimental techniques, such as X-ray crystallography, NMR, and cryo-electron microscopy (cryo-EM), have driven the rapid growth of atomic structures deposited in the Protein Data Bank (PDB). The large number of high-quality 3D structures is an important asset in the investigation of functional mechanisms in biochemistry and structural biology. On the other hand, accurate atomic details have also fed a wealth of data to machine learning approaches in computational protein structure prediction. The quality of such predicted models has now sufficiently improved to have a real impact in imaging-based structure determination, such as in cryo-EM, where the resolution of the experimental maps is often too low to resolve individual atoms.

As of April 2022, 8,029 atomic structures have been solved from 9,752 cryo-EM maps with better than 4 Å resolution. Even in those high-resolution maps, there are often local regions of lesser quality that are challenging to interpret, but for the better-defined regions, the atomic structures are reliable down to the position of individual atoms. In addition, 2,195 models have been predicted from 3,344 cryo-EM maps with 4–6 Å resolution. It is still challenging to determine structures accurately in this ‘twilight zone’ of resolution due to the ambiguities of interpreting the shapes of amino acid side chains (Cheng, Reference Cheng2015; Casañal et al., Reference Casañal, Shakeel and Passmore2019; Malhotra et al., Reference Malhotra, Träger, Dal Peraro and Topf2019; He et al., Reference He, Lin, Chen, Cao and Huang2022; Zhang et al., Reference Zhang, Zhang, Freddolino and Zhang2022). Recent studies have shown that the 3D prediction of atomic structures of proteins for which only the amino acid sequence is known can assist in the interpretation of cryo-EM maps when the quality of maps is insufficient to resolve atoms and amino acid side chains (Jiang et al., Reference Jiang, Baker, Ludtke and Chiu2001; Topf et al., Reference Topf, Baker, Marti-Renom, Chiu and Sali2006; DiMaio et al., Reference DiMaio, Tyka, Baker, Chiu and Baker2009, Reference DiMaio, Song, Li, Brunner, Xu, Conticello, Egelman, Marlovits, Cheng and Baker2015; Baker et al., Reference Baker, Abeysinghe, Schuh, Coleman, Abrams, Marsh, Hryc, Ruths, Chiu and Ju2011; Lindert et al., Reference Lindert, Alexander, Wotzel, Karaka, Stewart and Meiler2012; Wang et al., Reference Wang, Kudryashev, Li, Egelman, Basler, Cheng, Baker and DiMaio2015; Chen et al., Reference Chen, Baldwin, Ludtke and Baker2016; Afonine et al., Reference Afonine, Poon, Read, Sobolev, Terwilliger, Urzhumtsev and Adams2018; Terashi and Kihara, Reference Terashi and Kihara2018; Zhang et al., Reference Zhang, Zhang, Pearce, Shen and Zhang2020). Finally, there are also 1,066 atomic models in the PDB that were derived from 2,573 maps of medium resolution (6–10 Å), where the backbone of the polypeptide chain is generally no longer visible in the map. These models are predominantly derived by fitting known template structures into the maps (Wriggers et al., Reference Wriggers, Milligan, Schulten and McCammon1998, Reference Wriggers, Agrawal, Drew, McCammon and Frank2000; Tama et al., Reference Tama, Wriggers and Brooks2002; Chacon et al., Reference Chacon, Tama and Wriggers2003; Wriggers, Reference Wriggers2010, Reference Wriggers2012; Kovacs et al., Reference Kovacs, Galkin and Wriggers2018). A template structure can be an existing protein structure of a closely related protein or a model that is modified from an existing structure. The initial model must be similar to the structure of the target protein for fitting to low-resolution maps to be reliable (Egelman, Reference Egelman2008). Due to the limitations of such fitting, 6–10 Å resolution cryo-EM maps are also increasingly deposited without associated PDB models (95 in 2002–2009, 223 in 2010–2014, 645 in 2015–2019, and 567 since 2020). These recent trends in medium-resolution prolificacy call for new computational tools that enable such cryo-EM maps to bear atomic resolution fruit at a later time.

The rise of deep learning methods capable of producing highly accurate structures has recently revolutionised the computational protein structure prediction field. In the first 12 Critical Assessment of Protein Structure Prediction (CASP) meetings, the prediction accuracy for difficult targets was generally poor, with an overall less than 50 Global Distance Test − Total Score (GDT_TS) (Martz, Reference Martzn.d.), above which a model generally represents a correct fold (Kryshtafovych et al., Reference Kryshtafovych, Schwede, Topf, Fidelis and Moult2021). This was due to the challenge of handling proteins with previously unknown folds and to insufficient knowledge extracted from existing sequences and structures. However, the debut of deep learning led to a marked improvement in prediction accuracy. By CASP14 in 2020, AlphaFold2 had become the best-performing method across all levels of target difficulty (Kryshtafovych et al., Reference Kryshtafovych, Schwede, Topf, Fidelis and Moult2021). Ranked by increasing difficulty, the challenge levels are Template-Based Modelling-easy (TBM-easy), Template-Based Modelling-hard (TBM-hard), Free Modelling/Template-Based Modelling (FM/TBM), and Free Modelling (FM). For 87 of the 92 domain targets, the best of five models submitted by the AlphaFold2 group of DeepMind achieved near experimental accuracy, with GDT_TS above 70 (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek and Potapenko2021a). The marked improvement of accuracy for the most difficult targets in Free Modelling represents a significant improvement in the state of the art in protein structure prediction (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek and Potapenko2021a; Kryshtafovych et al., Reference Kryshtafovych, Schwede, Topf, Fidelis and Moult2021).

The success in predicting Free Modelling targets at CASP was largely due to the improved prediction of residue contact distances, beyond a yes or no answer (Hou et al., Reference Hou, Wu, Cao and Cheng2019; Xu, Reference Xu2019). Coevolution can be related to statistical dependencies that encode the contact between two residues. For example, if one changes from a positively charged residue, the other is likely to change to a negatively charged residue. Deep learning methods, such as MULTICOM, TripletRes, DeepPotential, tFold, and RaptorX, have been shown effective in uncovering residue coevolutionary patterns among homologous sequences (Guo et al., Reference Guo, Wu, Liu, Hou and Cheng2021; Li et al., Reference Li, Zhang, Bell, Zheng, Zhou, Yu and Zhang2021a, Reference Li, Zhang, Zheng, Zhou, Bell, Yu and Zhang2021b; Shen et al., Reference Shen, Wu, Lan, Zheng, Pei, Wang, Liu and Huang2021; Xu et al., Reference Xu, Mcpartlon and Li2021). Due to such improvements, other structure prediction methods, such as RoseTTAFold, QUARK, and MULTICOM, have also recently shown improved model accuracy (Yang et al., Reference Yang, Anishchenko, Park, Peng, Ovchinnikov and Baker2020; Baek et al., Reference Baek, DiMaio, Anishchenko, Dauparas, Ovchinnikov, Lee, Wang, Cong, Kinch and Schaeffer2021; Zheng et al., Reference Zheng, Li, Zhang, Pearce, Mortuza and Zhang2019, Reference Zheng, Li, Zhang, Zhou, Pearce, Bell, Huang and Zhang2021; Hou et al., Reference Hou, Wu, Guo, Quadir and Cheng2020; Wu et al., Reference Wu, Liu, Guo, Hou and Cheng2021).

The availability of highly accurate predicted models potentially transforms many studies in structural biology, but the impact of AlphaFold2 models remains to be studied in more detail in various specific applications. In the related Molecular Replacement problem in X-ray crystallography (which relies upon the existence of a model that is similar to the unknown structure from which the diffraction data is derived), a recent study shows that 30 of 32 models produced by AlphaFold2 in CASP14 can be successfully used as search models (Pereira et al., Reference Pereira, Simpkin, Hartmann, Rigden, Keegan and Lupas2021). In cryo-EM, a recent study showed that 22 of 25 AlphaFold2 models can be used as initial models to produce models with over 90% alpha-carbon accuracy when they are refined using high-resolution cryo-EM maps up to about 4 Å resolution (Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022). However, as of yet, little is known about the benefit of AlphaFold2 for interpreting cryo-EM maps of lower resolution, where certain chains and regions do not have a known available template structure that could be fitted.

One of the difficulties in the evaluation of computational methods that apply to lower-quality maps is the lack of sufficient benchmark data. Although many atomic models have been derived from cryo-EM maps between 4 and 10 Å resolution, it is challenging to validate those models. For example, a misalignment of corresponding atomic structures has been reported for helix regions (Wriggers and He, Reference Wriggers and He2015; Sazzed et al., Reference Sazzed, Scheible, Alshammari, Wriggers and He2020) of lower-resolution cryo-EM maps. Due to challenges in obtaining reliable (experimentally derived) map-model pairs at lower resolutions, the simulation of cryo-EM density maps has become important.

Existing methods for simulating density maps (either in direct space or Fourier space) are based on the convolution of atom points with a resolution-lowering point-spread function. In the pdb2mrc of EMAN, the molmap function in Chimera, and the pdb2vol function of Situs, a 3D density map is produced using a Gaussian point-spread function whose real-space dimension corresponds to a desired resolution value, depending on the specific resolution convention of the packages (Ludtke et al., Reference Ludtke, Baldwin and Chiu1999; Pettersen et al., Reference Pettersen, Goddard, Huang, Couch, Greenblatt, Meng and Ferrin2004; Wriggers, Reference Wriggers2012). In this study, we propose a new way to produce a hybrid density map based on a Gaussian convolution of an experimental cryo-EM map (instead of an atomic structure). The variable resolution value adds a new dimension to the method validation. As a bonus, the approach also incorporates any quality variation within the parent high-resolution cryo-EM map into the hybrid map, resulting in a more realistic low-resolution density model.

Using hybrid maps, we can monitor any change in the effect of the refinement of AlphaFold2 models when Phenix software is applied at specific resolution values. Phenix is a Python-based refinement suite that was historically developed for X-ray crystallography and is therefore most suitable for high-resolution cryo-EM density maps (better than 4.5Å, according to Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022)). The Phenix refinement protocols we used here were tightly integrated with AlphaFold2 and rely on specific outcomes of the AlphaFold2 prediction process (see Methods). An earlier study by Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022) already demonstrated that AlphaFold2 models can be refined against high-resolution cryo-EM density maps, but the utility of the approach was not conclusive for cryo-EM maps with lower than 4 Å resolution, since only three such cases were tested and they achieved mixed success. In the present work, we tested a revised set of experimental high-resolution maps, and we also explored the impact of the refinement of AlphaFold2 models using hybrid maps of progressively sampled lower resolutions of 5, 6, 8, 10, and 12 Å. The refinement against such lower resolution maps is not the original scope of Phenix, but it is important to us and to many other groups that focus on modelling cryo-EM maps across a wider resolution range. Our results demonstrate the potential for AlphaFold2 models to be applied in lower than 4 Å resolution maps through refinement.

Methods

Both experimental cryo-EM maps between 2 and 4.5 Å resolution (see section ‘The data’) and hybrid maps (see section ‘Hybrid experimental-simulated density maps’) between 5 and 12 Å were used in the study. AlphaFold2 is accessible from both its standalone copy, which can be downloaded and installed locally, as well as web services established by both DeepMind and third-party groups (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek and Potapenko2021b; Mirdita et al., Reference Mirdita, Schütze, Moriwaki, Heo, Ovchinnikov and Steinegger2021). The refinement function of Phenix for AlphaFold2 models is also accessible both from a locally installed Phenix distribution and from its cloud service through Google Colab. In this study, most of the refinements of AlphaFold2 models were conducted using the free-membership Colab server of Phenix because of their tight integration, but a few cases were conducted using the local copy of Phenix for a fine-tuning of parameters (see details in section ‘Structure prediction using AlphaFold2 and refinement using Phenix’).

The data

Since the goal of the work was to study the effect of an existing refinement procedure on density maps of different resolutions, a data set of 13 cases was created. Of these, 12 were used in the study of Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022), and one was added. The newly added case is a Free Modelling case in CASP14 (T1047S1D1, CASP ID) and it has a cryo-EM map (EMDB 12183, PDB 7BGL chain A) associated with it. The atomic structure of this case was downloaded from the PDB in March 2022. The other 12 structures listed in Table 1 were provided from the depository of Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022), representing the structures downloaded in August 2021 with recent unique size structures between 100 and 1,000 amino acids and a cryo-EM map of 4.5 Å or better. Each case consisted of a sequence of amino acids, its corresponding density map, and an atomic structure (Table 1). Cryo-EM maps were downloaded from Electron Microscopy Data Bank (EMDB), as indicated by the ID number in Table 1.

Table 1. Accuracy of models before and after refinement using high-resolution cryo-EM maps and hybrid density maps of 5, 6, and 8 Å resolutions

a Protein IDs (PDB ID_EMDB ID_Chain ID). For the two chains involved in CASP challenges, CASP target IDs are indicated.

b The number of amino acids in the protein.

c The resolution of cryo-EM maps.

d The average pLDDT scores of AlphaFold2 models.

e The accuracy is indicated as TM scores for models obtained from AlphaFold2.

f TM scores for models refined using Phenix (Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022), the cryo-EM maps (High) and the hybrid density maps at 5, 6, and 8 Å, respectively.

g The Phenix resolution parameter was tuned 2–3 Å lower than the nominal resolution of the density map to ensure completion of the refinement protocol (see text).

Hybrid experimental-simulated density maps

In this work, there was a need to adjust the resolution of cryo-EM maps used in the validation of the Phenix refinement of the AlphaFold2 models. The adjustment had to be done on specific maps, since our tests below show that the performance of the refinement varies greatly between systems. Traditionally, there have been methods, in EM modelling, that lower the resolution of atomic structures to create ‘simulated’ cryo-EM maps, such as the pdb2vol tool of Situs (Wriggers, Reference Wriggers2012). However, such simulated maps would not mimic the unique features of experimental cryo-EM maps, such as structural deviations, uneven local resolution, noise, structural flexibility and disorder, or the specific image processing effects of the 3D reconstruction process. Therefore, we designed a novel hybrid experimental-simulated density map, using a high-resolution experimental map as a basis for the resolution lowering instead of an atomic structure. To re-use the existing resolution lowering code (pdb2vol) in Situs, the cryo-EM density format was first converted with the vol2pdb tool, with each density voxel represented by a PDB ATOM record that stores the voxel density in the PDB occupancy field. Each density voxel was then convoluted with a Gaussian filter using a modified version of pdb2vol, with a filter size determined by the desired resolution of the hybrid map. The final resolution of the hybrid map depends on both the pre-existing (fixed) resolution $ {R}_e $ of the experimental map, and the user-controlled resolution parameter $ {R}_s $ of the pdb2vol convolution. The relationship is straightforward because the resolution point spread of the experimental map can itself be approximated by a Gaussian of resolution $ {R}_e $ . In this case, the convolution of two Gaussians is simply a Gaussian with a larger resolution value $ {R}_h\hskip0.35em =\hskip0.35em \sqrt{R_e^2+{R}_s^2\;} $ (Bromiley, Reference Bromiley2003). For a desired hybrid target resolution $ {R}_h $ , and a cryo-EM map with pre-existing resolution $ {R}_e $ , the required resolution parameter $ {R}_s $ of the Gaussian filter can be computed this way. Hybrid density maps of $ {R}_h $ = 5, 6, 8, 10, and 12 Å resolution were created for each case in this fashion.

The detailed relationship between resolution values and dimensions of the Gaussian for various methods, including Situs, are described in section ‘Discussion and conclusion’ of Wriggers (Reference Wriggers2012). There is a significant difference between resolution conventions between software tools, since no uniform standards exist in the experimental and theoretical communities (Wriggers, Reference Wriggers2012). The Situs resolution convention (double the 3D standard deviation of the Gaussian) is different from EMAN2 and UCSF Chimera and was designed to show features at comparable levels of detail with published experimental maps, so we expect that the $ {R}_e $ and $ {R}_s $ values in the above formula are compatible. However, users should be aware that this assumption should ideally be tested with a detailed resolution analysis, especially if different packages are used for the calculation of $ {R}_s $ .

Structure prediction using AlphaFold2 and refinement using Phenix

The overall idea in refinement is to first identify the most consistent model among a set of suggested models from AlphaFold2. The selected model was then processed to trim unreliable residues using the per-residue confidence scores produced from AlphaFold2. The resulting more reliable regions of the model are broken up into domains and docked in the density map, whilst maintaining the connectivity relationship among domains. The model is then morphed and rebuilt using a density map (Terwilliger, Reference Terwilligern.d.). Briefly, this involves the fitting of the segments and the modelling of connecting loops using various techniques such as refinement, tracing, loop building, and chain growing. Detailed Phenix instructions for refining AlphaFold2 models are available online (Thomas, Reference Thomasn.d.).

Although AlphaFold2 software can be downloaded and installed on local machines, a simple way to obtain a predicted model is to use its service set up on the Google Cloud Platform. Recently, a convenient web interface was established on Google Cloud that initiates a task to run AlphaFold2 and then refines the model using the functions of Phenix software and a density map (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek and Potapenko2021b; Mirdita et al., Reference Mirdita, Schütze, Moriwaki, Heo, Ovchinnikov and Steinegger2021; Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022). We utilised such cloud services for 11 of the 13 cases to collect models generated from AlphaFold2 and to conduct subsequent refinement using Phenix. Specifically, models were obtained from a Google Colab Notebook ‘AlphaFold with a density map’, a Python code environment for Google Cloud services (Google Colab Notebook, n.d.). Default parameters were used, except for the number of iterations of refinement. Only one iteration of refinement was performed, rather than four iterations performed in the study of Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022) because our tests showed that the conclusions of this paper did not depend on the number of iterations. For two cases (7LV9–23530-B and T1047S1D1–7BGL-12183-A), the downloaded Phenix software, instead of the Colab server, was used. Regarding the lower-confidence prediction 7LV9–23530-B, the maximum_rmsd parameter was fine-tuned in the local copy to 2.5 Å, instead of the default of 1.5 Å provided by the Colab server, for enhanced sampling. In the case of T1047S1D1–7BGL-12183-A, a local run was necessary because the trial on the Colab server exceeded the time limit of the free account. The same version of Phenix, dev-4536, was used in either the Colab server or the local copy.

To prepare the density map for refinement, we followed the Phenix documentation and applied the tools phenix.local_aniso_sharpen and phenix.map_box. The map resulting from these steps was a sharpened, rectangular cropped region containing the chain of interest. The nominal density map resolution was used as an upper bound for the Phenix ‘high-resolution limit’ of the main search. The documentation recommends trying the nominal resolution, but to lower the parameter as needed for a ‘quicker search’ or to compensate for model quality. We found that the Phenix refinement against the experimental cryo-EM maps was completed without any lowering of this parameter. However, for some of the lower resolution hybrid density maps (Table 1), the refinement failed at the docking stage. Therefore, as recommended by the instructions, a 2–3 Å larger resolution parameter than the nominal map resolution was used in these cases.

Results

This study aims to evaluate the accuracy of models obtained using the AlphaFold2 method and those refined using both cryo-EM maps of high resolutions and hybrid maps of lower resolutions. Among the models produced from AlphaFold2, the model selected by Phenix software was used in the evaluation of accuracy and subsequent refinement. The selected model represents the one with the best confidence based on the average predicted local distance difference test (pLDDT) among the list of suggested models from AlphaFold2 (Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022). The pLDDT (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek and Potapenko2021b) is a per-residue confidence metric on a scale from 0 to 100, and it estimates how well a prediction would agree with the true structure based on the local distance difference test Cα (Mariani et al., Reference Mariani, Biasini, Barbato and Schwede2013). The TM-align method calculates actual structural similarity using heuristic dynamic programming iterations, and it allows the comparison of two models that are not similar in certain regions (Zhang and Skolnick, Reference Zhang and Skolnick2005). Each model was aligned with the true structure using TM-align, and the TM-score was used for an estimation of the accuracy of the model. (Note that the amino acid sequence submitted to the AlphaFold2 server is longer if the corresponding atomic structure misses a segment of the sequence in structure determination; we used the length of the true structure for TM score normalisation). In the following, we describe our validation studies on experimental high-resolution cryo-EM maps (section ‘AlphaFold2 models and improved accuracy using high-resolution cryo-EM maps’) and on lower-resolution hybrid maps (section ‘Refinement of AlphaFold2 models using hybrid maps’). This is followed by a secondary structure analysis (section ‘Secondary structure analysis of refinement performance’) to characterise the observed performance.

AlphaFold2 models and improved accuracy using high-resolution cryo-EM maps

For the 13 cases tested, the accuracy of models obtained from AlphaFold2 is quite good, since 11 of them show higher than 0.7 TM-score, and eight models have higher than 0.8 TM-score (Table 1). The TM scores correlated with average pLDDT values (Table 1), suggesting that AlphaFold2 pLDDT scores predict the refinement success to some extent. (However, small local errors that are undetected by the pLDDT averaging can have global structural consequences, so the TM score was used as a standard for the validation against the true structures.)

An example with a 0.82 TM-score shows that the overall fold and secondary structure elements, such as helices and β-strands, are correct (Fig. 1a). Minor inaccuracies remain in the model in terms of the length of the secondary structures, the loop, and the relative positioning of the two secondary structures. For a case with a TM score of 0.53, one of the two cases with a score less than 0.7, the fold of the model is still correct, and the secondary structures are well-predicted (Fig. 2a). This chain was a target in the difficult Free Modelling category of CASP14. Although our current AlphaFold2 model was obtained from the Colab server of AlphaFold2, it is similar to the model submitted in CASP14 (data not shown). One of the 13 test cases showed poor model accuracy, with a TM-score of 0.39 (Table 1). The main deficiency of the model is that two shorter helices were predicted as one long helix, which affected the overall fold of the chain (Fig. 3a).

Fig. 1. Models obtained from AlphaFold2 and the refinements using Cryo-EM map 23274–7LCI-R (EMDB-PDB-chain ID) and hybrid density maps at 6 and 8 Å resolutions. (a) Superposition of the protein structure (red, chain R of 7LCI) and the model obtained from AlphaFold2. (b1) The box-cropped region of cryo-EM map 23274 (EMDB ID, cyan) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red, chain R of 7LCI) and the refined model (blue) using Phenix and the box-cropped cryo-EM map in b1. Hybrid density maps of 6 Å (grey in c1) and 8 Å (yellow in d1) resolutions are superimposed with the corresponding models refined from the maps, respectively. The 6 Å-map-refined model (Cyan ribbon in c1, c2) and 8 Å-map-refined model (green in d1, d2) are superimposed with the structure (red) in c2 and d2. The superposition of two atomic models was performed with TM-align (Zhang and Skolnick, Reference Zhang and Skolnick2005) in all figures. The superposition of a density map and a model was performed using Phenix (Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022) in all figures. An example of a weaker density region in the cryo-EM map and in the corresponding hybrid maps is indicated by an ellipse in b1, c1, and d1.

Fig. 2. Models obtained from AlphaFold2 and the refinements using Cryo-EM map 12183–7BGL-A-T1047S1D1 (EMDB-PDB-chain ID-CASP14 target ID) and hybrid density maps at 6 and 8 Å resolutions. (a) Superposition of the protein structure (red, chain A of 7BGL) and the model obtained from AlphaFold2 (yellow). This chain is one of the free modelling targets in CASP14 with ID T1047S1D1. (b1) The box-cropped region of cryo-EM map 12183 (EMDB ID, cyan) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red, chain A of 7BGL) and the refined model (blue) using Phenix and the box-cropped cryo-EM map in b1. Hybrid maps of 6 Å (grey in c1) and 8 Å (yellow in d1) resolutions are superimposed with the model refined from the corresponding map. The 6 Å-map-refined model (Cyan ribbon in c1, c2) and 8 Å-map-refined model (green in d1, d2) are superimposed with the structure (red) in c2 and d2.

Fig. 3. Predicted models using AlphaFold2 for 7LV9-B and 7L6U-A (PDB ID–Chain ID). The structures (red) and models predicted using AlphaFold2 (yellow) are superimposed for chain B of 7LV9 (a) and chain A of 7L6U (b). See the Supplementary Material for more details about the two cases.

The refinement of AlphaFold2 models using Phenix and high-resolution cryo-EM maps was successful, since an improvement in accuracy was observed for all the 13 cases (Table 1). This observation is similar to the results of Terwilliger et al., even though there are minor differences in the data, the number of iterations of refinement, and the evaluation of model accuracy. The evaluation of model accuracy was performed using TM scores instead of the percentage of alpha-carbons, and a new CASP target was added to the test data. Our results show that the high-resolution cryo-EM maps and the refinement method proposed by Terwilliger et al. are capable of correcting model errors. In particular, for the eight best models with over 0.8 TM scores obtained from AlphaFold2, the refinement consistently enhanced them to near experimental accuracy models with near or over 0.9 TM scores (Table 1). For the three models of TM score between 0.7 and 0.8, the enhancement is modest, producing models of near 0.8 TM score after refinement. For the poor model that has a TM score of 0.39, the enhancement is limited, since the refined model has a TM score of 0.42. Our results show that the level of enhancement is related to the quality of the initial model. Those initial models with better than 0.8 TM scores consistently produce near-experimental accuracy. It is worth mentioning that the refinement was conducted using a box-cropped region of the cryo-EM map near the protein chain. Without using the knowledge of the boundary of the chain, a box-cropped region often contains partial density of neighbouring chains; therefore, the refinement of such a boxed region is harder than using a region masked by the envelope of the chain. If certain knowledge about neighbouring chains is available, it might be easier for the refinement process. The experiment in this study tests the original power of the density map in refinement without any knowledge of neighbouring chains, and we observe that the high-resolution cryo-EM maps have such power to refine initial models obtained from AlphaFold2. The limited enhancement in refinement of the model in the case of 7LV9 may be related to a combination of factors, such as the small size of the chain, the accuracy of the model, and the resolution of the density map (Table 1). This case has the lowest accuracy for the initial model obtained from AlphaFold2 and the lowest resolution of 4.5 Å among the data set.

Refinement of AlphaFold2 models using hybrid maps

For each experimental cryo-EM map in the previous section, hybrid density maps were generated at specific resolution values of 5, 6, 8, 10, and 12 Å. The same refinement procedure in Phenix was applied to hybrid maps at different resolution values, as in the previous section. When the resolution of maps was progressively lowered from 5 to 12 Å, the refinement procedure generally degraded in performance. Among the 13 cases, the number of cases with enhanced model accuracy after refinement (Fig. 4) is 13 for all high-resolution cryo-EM maps, but it drops to 9, 6, 1, 0, and 0 when hybrid density maps of 5, 6, 8, 10, and 12 Å resolutions were used, respectively (Fig. 4 and Table 1). Our results show that the current refinement method is most suitable for maps with resolutions higher than 6 Å.

Fig. 4. Accuracy of models measured using TM-align. The TM score of each model was calculated against the protein structure downloaded from the PDB for 13 cases. In each case, the accuracy is shown from left to right for the model obtained using AlphaFold2 (black), refinement using Phenix and cryo-EM maps (green), refinement using hybrid map of 5 Å (red), 6 Å (blue), 8 Å (yellow), 10 Å (grey), and 12 Å (light blue) resolutions.

We observed that 6 Å was a breakeven point, below which the refinement predominantly degrades the AlphaFold2 models, and above which most of them are improved. Therefore, we analysed the breakeven point in more detail in the following. When hybrid density maps at 6 Å were used in refinement, almost half (6 of the 13 cases) exhibited improved model accuracy (Table 1 and Fig. 4). This shows that the hybrid maps at 6 Å still have the potential to correct the initial models obtained from AlphaFold2. We also observed that all six cases started from already reasonable initial models with 0.76 to 0.90 TM scores. The three most enhanced cases are 7LCI (enhanced from a TM-score of 0.82 to 0.94), 7L6U (from 0.90 to 0.95), and 7LX5 (from 0.89 to 0.92). In the case of 7LCI, the enhancement appears to be mostly in the β-sheet region of the chain (Fig. 1a, c2).

In the remaining seven cases, the model accuracy at 6 Å decreased. The performance for all cases also degrades significantly at 8 or 10 Å resolution (Fig. 4) due to our refining outside the high-resolution design parameters of Phenix (note that when high-resolution cryo-EM maps were used in refinement, the model accuracy was enhanced for all 13 cases). Fig. 1b1, c1, d1 shows one example where weak density in the cryo-EM and related hybrid maps (ellipse) diminishes the refinement accuracy at 8 Å resolution.

The number of successful (improved) cases increased from six to nine when hybrid maps of 5 Å instead of 6 Å resolution were used in the refinement (Fig. 4, red bars vs. black bars). Our results, therefore, show that the majority of cases at 5 Å can still benefit from Phenix, although a previous study (conducted predominantly with 2–4 Å resolution cryo-EM maps) suggested a 4.5 Å limit (Terwilliger et al., Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022).

Secondary structure analysis of refinement performance

As an example of the refinement performance, and to provide a demonstration of the challenges involved, we show one case, 7KZZ (PDB ID), in more detail. The model accuracy was increased from a TM score of 0.76 to 0.81 after refinement using the cryo-EM map of 3.42 Å resolution, but decreased to 0.70 using the hybrid map of 6 Å resolution (Table 1 and Fig. 4). This chain has an upper domain and a lower domain. The upper domain was predicted accurately using AlphaFold2, but the lower domain was not accurately predicted, as seen in either the superposition of the entire chain (Fig. 5a1) or the central axes of secondary structures (Fig. 5a2) (Stephanie et al., Reference Stephanie, Julio, Willy and Jing2017). The lower domain contains six long helices with lengths between 21 and 30 amino acids. In fact, the sequence segments of the six helices are well-predicted, with the maximum shift of any of the 12 ends of the 6 helices within 4 amino acids when compared to the true structure.

Fig. 5. The intersecondary structure geometry for long helices in the predicted and refined models of 7KZZ chain B. (a1) The superposition of the protein structure (red, chain B of 7KZZ) and the model obtained from AlphaFold2 (yellow). (a2) Secondary structures of those superimposed models in a1 are represented by their central axes using AxisComparison (Haslam et al., Reference Haslam, Sazzed, Wriggers, Kovcas, Song, Auer, He, Zhang, Cai, Skums and Zhang2018); The central axes of helices (red) and beta-strand (green) in the structure; the central axes of helices (yellow) and beta-strands (black) in the model obtained from AlphaFold2. (a3) The axes of three consecutive long helices (H5, H6, and H7) of the structure are overlaid with the corresponding axes of three helices (H4, H5, and H6) of the model using two vectors, the vector of the central axes between Trp168 and Ala196, and the vector of the turn between Trp168 and Tyr165. Amino acids are labelled at the start and end of a helix. (b1) The box-cropped region of cryo-EM map 23093 (EMDB ID, yellow) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red) and the refined model (blue) obtained using Phenix and the box-cropped cryo-EM map in b1. (c1) Box-cropped hybrid density map of 6 Å resolution (grey in c1) superimposed with the model refined from it. (c2) Superposition of the structure (red) and the model (cyan) refined using Phenix and the box-cropped cryo-EM map of 6 Å resolution. Annotation of secondary structures and molecular graphics was conducted with ChimeraX (Pettersen et al., Reference Pettersen, Goddard, Huang, Meng, Couch, Croll, Morris and Ferrin2021).

Although the individual helix segments are well-predicted, the arrangement of the six long helices deviates from the true structure. Therefore, it is impossible to fit the predicted model well with either the cryo-EM map or the hybrid density map (Fig. 5a2). Since fitting the initial model is a step before refinement, the incorrect arrangement of the six long helices presents a challenge that refinement needs to overcome. This might contribute partially to the limited enhancement from 0.76 to 0.81, not surpassing 0.9 in the TM score, even after refinement using the high-resolution cryo-EM map.

To illustrate the arrangement of the helices, we used three consecutive long helices and manually superimposed one of them (H7 in the true structure and H6 in the predicted model) so that the two vectors were approximately aligned (Fig. 5a3). The first vector represents the central axis of the helix between Trp168 and Ala196, and the second vector represents the turn between Trp168 and Tyr 165 (Fig. 5a3). This demonstration of a subset of helices shows that the relative orientations of the other two helices in the model (yellow lines) differ from those in the true structure (red lines).

Fig. 5 shows that the knowledge of secondary structure locations in a density map can be important for refinement against lower-resolution maps. Due to the spacing of β-strands of about 5 Å, individual strands are not detectable in density maps with a resolution lower than 6 Å. However, β-sheets are still detectable above about 8 Å, and α-helices are detectable above about 10 Å resolution. Therefore, it might be possible to improve the refinement strategy to handle down to 8 Å resolution maps if secondary structure information is integrated. In practice, however, detection accuracy is affected by the local quality of a map and the complexity of a structure. A recent study presented a novel flexible fitting method for cryo-EM maps at intermediate resolutions (4–10 Å). The key idea was to guide the fitting by the correspondence between the α-helices in the cryo-EM map and those in the model (Dou et al., Reference Dou, Burrows, Baker and Ju2017).

To explore the potential benefit of secondary structure detection, we used DeepSSETracer (Mu et al., Reference Mu, Sazzed, Alshammari, Sun and He2021), a deep learning-based method that can be plugged into ChimeraX to segment volumes belonging to test case 23274–7LCI-R. In this example, the β-sheet region (cyan) can be approximately segmented in the 8 Å resolution hybrid map (Fig. 6b vs. c or d). In addition, most of the helices (yellow) were detectable (Fig. 6b vs. c or d). Note that the detection was performed on a box-cropped map, so the assignment of features in Fig. 6 might include neighbouring chains. When the AlphaFold2 model was aligned with the detected secondary structure regions, the secondary structure regions were visually in good agreement (Fig. 6b). This is encouraging since it suggests an overall validity of the AlphaFold2 model. However, minor disagreement was observed between the model and the segmented secondary structure regions, as indicated by two arrows for the helix regions (Fig. 6b). At these two spots, the detected helix regions agree more with the atomic structure (Fig. 6b, d) and less with the AlphaFold2 model (Fig. 6b, c), and they point to locations for potential improvement in the AlphaFold2 model.

Fig. 6. Secondary structure regions detected from the box-cropped hybrid map of 8 Å resolution for 23274–7LCI-R (EMDB-PDB-chain ID). The helix (yellow) and β-sheet (cyan) regions in (a) and (b) were segmented from the hybrid map at 8 Å using the DeepSSETracer (Mu et al., Reference Mu, Sazzed, Alshammari, Sun and He2021). The model obtained from AlphaFold2 is coloured by the secondary structure type for helices (orange), β-sheet (cyan), coil (grey) in (c, b), and superimposed in (b). The atomic structure (red) and the model refined (green) from the hybrid map are shown in (d) and (e), respectively.

One of the challenges of incorporating any secondary structure information into refinement is the tradeoff between density and secondary structure fitting. Although Phenix was developed for high-resolution maps and emphasises density and structure fitting, enforcing secondary structure alignment with the map could prevent catastrophic failures at low resolution, such as the melting and misfolding of the β-sheet domain (cyan in Fig. 6c), prominently depicted in Fig. 6e.

Discussion and conclusion

This validation study provided new evidence that AlphaFold2 models can be enhanced by exploiting cryo-EM density maps. Our results using hybrid maps suggest that the 4.5 Å resolution limit in Terwilliger et al. (Reference Terwilliger, Poon, Afonine, Schlicksup, Croll, Millan-Nebot, Richardson, Read and Adams2022) was perhaps a bit too conservative, and good quality AlphaFold2 models might benefit from a refinement against density maps as low as 6 Å resolution.

The accurate determination of atomic structures from cryo-EM maps of 4–6 Å resolution is, of course, still challenging. Understanding the strengths and weaknesses of refinement of initial models provides insights into developing more effective methods. The success of refinement depends on the quality of an initial model, the quality of the density map, the complexity of the structure, and, last but not least, the specific refinement approach. In general, one would not expect an effective refinement method for high-resolution maps to work well for lower-resolution maps, and vice versa.

Our tests have shown that secondary structure information can be beneficial in a future medium-resolution refinement approach. Secondary structures can be detected in cryo-EM maps from 5 to 10 Å resolution (Jiang et al., Reference Jiang, Baker, Ludtke and Chiu2001). Many methods have been developed for the detection of both helices and β-sheets (Baker et al., Reference Baker, Ju and Chiu2007; Si and He, Reference Si and He2013; Li et al., Reference Li, Si, Zeng, Ji and He2016; Maddhuri Venkata Subramaniya et al., Reference Maddhuri Venkata Subramaniya, Terashi and Kihara2019; Wang et al., Reference Wang, Alnabati, Aderinwale, Subramaniya, Terashi and Kihara2021). Despite recent progress in the development of deep learning detection methods, accurate detection is still challenging. Our test at 8 Å was generally at the limit of detectability for β-sheets, and close to the limit for α-helices, although the complexity of a structure also affects the accuracy of detection. In the example, the length of the detected helices was approximate, and there was also a certain amount of false positive β detection density (Fig. 6a, d). To utilise the strength of such predicted but imperfect secondary structure locations, the refinement method needs to take into account various factors, such as the likelihood of correct detection, local quality of the map, and local structural complexity. A well-predicted initial AlphaFold2 model could complement the secondary structure prediction, as well as the density matching. However, even the AlphaFold2 models are not perfect. As was the case in the bygone era of low-resolution cryo-EM maps, there remains the risk of a compounding of errors when fitting imperfect models to imperfect densities (Egelman, Reference Egelman2008).

A more tangible benefit of the present work is a new real-space tool for filtering experimental cryo-EM maps to an arbitrary lower resolution value without requiring an atomic structure. The simulation of density maps is an important computational approach to validating methods. Traditionally, a simulated density map of a protein structure is created using the atomic structure of a protein (Ludtke et al., Reference Ludtke, Baldwin and Chiu1999; Pettersen et al., Reference Pettersen, Goddard, Huang, Couch, Greenblatt, Meng and Ferrin2004; Wriggers, Reference Wriggers2012). However, it has been challenging to create simulated data that mimic experimental data in all aspects, such as resolution, noise, and artefacts, due to the 3D reconstruction process. In the current method, more realistic data in a high-resolution cryo-EM map, rather than ideal atomic positions, were included in the simulation. An interesting side effect is that the resulting hybrid maps are expected to retain some features of the original experimental EM density [such as inhomogeneous density distribution and local resolution variations (Swint-Kruse and Brown, Reference Swint-Kruse and Brown2005; de la Rosa-Trevin et al., Reference de la Rosa-Trevin, Quintana, Del Cano, Zaldivar, Foche, Gutierrez, Gomez-Blanco, Burguet-Castell, Cuenca-Alba, Abrishami, Vargas, Oton, Sharov, Vilas, Navas, Conesa, Kazemi, Marabini, Sorzano and Carazo2016; Vilas et al., Reference Vilas, Gómez-Blanco, Conesa, Melero, Miguel de la Rosa-Trevín, Otón, Cuenca, Marabini, Carazo, Vargas and Sorzano2018)]. In other ways, the hybrid maps are also dominated by the effect of the Gaussian filter (i.e. high frequencies are attenuated rather than cut off or hidden in the noise). Thus, the hybrid maps could, in principle, exhibit a wide range of spatial frequencies, from low frequencies resulting from sample heterogeneity or variability (Leschziner and Nogales, Reference Leschziner and Nogales2007; Cardone et al., Reference Cardone, Heymann and Steven2013; Katsevich et al., Reference Katsevich, Katsevich and Singer2015; Naydenova and Russo, Reference Naydenova and Russo2017; Lyumkis, Reference Lyumkis2019; Méndez et al., Reference Méndez, Garduno, Carazo and Sorzano2021; Punjani and Fleet, Reference Punjani and Fleet2021), ranging all the way to the high frequencies in the experimental map (albeit attenuated). In future work, we will explore how well such hybrid maps mimic true low-resolution cryo-EM maps.

An intriguing effect of the resolution lowering afforded by hybrid maps is exemplified by the two cases – 7BRM and 7EDA – where Phenix refinement performance was unexpectedly improved when the resolution was lowered to 5 Å. This suggests that the refinement of AlphaFold2 models to high-resolution cryo-EM maps can get trapped in the local optima. The results also suggest that a more exhaustive sampling of conformations might be required, and that lowering resolution could be part of an annealing strategy to escape from local traps. This is yet another argument as to why it could make sense to develop a lower resolution refinement strategy even for high-resolution maps.

Acknowledgement

We thank Min Dong for IT support with the software installation.

Supplementary Materials

To view supplementary material for this article, please visit http://doi.org/10.1017/qrd.2022.13.

Data Availability Statement

Atomic models and maps used for testing are available at the public servers and databases (AlphaFold2, PDB, and EMDB; see section ‘Methods’), except for those of our refined models, which are available from the authors on reasonable request. The tools for creating hybrid experimental-simulated cryo-EM maps (see section ‘Methods’) will be available as part of the upcoming release of the Situs package at http://situs.biomachina.org.

Funding Statement

This work was supported by NIH Grant No. R01-GM062968, the ODU Batten Endowment to W.W., and a scholarship to M.A. by the Government of Saudi Arabia.

Open Peer Review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2022.13.

References

Afonine, PV, Poon, BK, Read, RJ, Sobolev, OV, Terwilliger, TC, Urzhumtsev, A and Adams, PD (2018) Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallographica Section D: Structural Biology 74(6), 531544.Google ScholarPubMed
Baek, M, DiMaio, F, Anishchenko, I, Dauparas, J, Ovchinnikov, S, Lee, GR, Wang, J, Cong, Q, Kinch, LN and Schaeffer, RD (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373(6557), 871876.CrossRefGoogle ScholarPubMed
Baker, ML, Abeysinghe, SS, Schuh, S, Coleman, RA, Abrams, A, Marsh, MP, Hryc, CF, Ruths, T, Chiu, W and Ju, T (2011) Modeling protein structure at near atomic resolutions with gorgon. Journal of Structural Biology 174(2), 360373.Google ScholarPubMed
Baker, ML, Ju, T and Chiu, W (2007) Identification of secondary structure elements in intermediate-resolution density maps. Structure 15(1), 719.Google ScholarPubMed
Bromiley, P (2003) Products and convolutions of Gaussian probability density functions. Tina-Vision Memo No. 2003–003, pp. 1–13.Google Scholar
Cardone, G, Heymann, JB and Steven, AC (2013) One number does not fit all: Mapping local variations in resolution in cryo-EM reconstructions. Journal of Structural Biology 184(2), 226236.CrossRefGoogle Scholar
Casañal, A, Shakeel, S and Passmore, LA (2019) Interpretation of medium resolution cryoEM maps of multi-protein complexes. Current Opinion in Structural Biology 58, 166174.CrossRefGoogle ScholarPubMed
Chacon, P, Tama, F and Wriggers, W (2003) Mega-Dalton biomolecular motion captured from electron microscopy reconstructions. Journal of Molecular Biology 326(2), 485492.CrossRefGoogle ScholarPubMed
Chen, M, Baldwin, PR, Ludtke, SJ and Baker, ML (2016) De novo modeling in cryo-EM density maps with Pathwalking. Journal of Structural Biology 196(3), 289298.CrossRefGoogle ScholarPubMed
Cheng, Y (2015) Single-particle cryo-EM at crystallographic resolution. Cell 161(3), 450457.CrossRefGoogle ScholarPubMed
de la Rosa-Trevin, JM, Quintana, A, Del Cano, L, Zaldivar, A, Foche, I, Gutierrez, J, Gomez-Blanco, J, Burguet-Castell, J, Cuenca-Alba, J, Abrishami, V, Vargas, J, Oton, J, Sharov, G, Vilas, JL, Navas, J, Conesa, P, Kazemi, M, Marabini, R, Sorzano, CO and Carazo, JM (2016) Scipion: A software framework toward integration, reproducibility and validation in 3D electron microscopy. Journal of Structural Biology 195(1), 9399.CrossRefGoogle ScholarPubMed
DiMaio, F, Song, Y, Li, X, Brunner, MJ, Xu, C, Conticello, V, Egelman, E, Marlovits, TC, Cheng, Y and Baker, D (2015) Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nature Methods 12(4), 361365.CrossRefGoogle ScholarPubMed
DiMaio, F, Tyka, MD, Baker, ML, Chiu, W and Baker, D (2009) Refinement of protein structures into low-resolution density maps using Rosetta. Journal of Molecular Biology 392(1), 181190.Google ScholarPubMed
Dou, H, Burrows, DW, Baker, ML and Ju, T (2017) Flexible fitting of atomic models into cryo-EM density maps guided by helix correspondences. Biophysical Journal 112(12), 24792493.Google ScholarPubMed
Egelman, EH (2008) Problems in fitting high resolution structures into electron microscopic reconstructions. HFSP Journal 2(6), 324331.CrossRefGoogle ScholarPubMed
Google Colab Notebook (n.d.) AlphaFold with a density map. Available at https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphafold2/AlphaFoldWithDensityMap.ipynb (accessed 16 May 2022).Google Scholar
Guo, Z, Wu, T, Liu, J, Hou, J and Cheng, J (2021) Improving deep learning-based protein distance prediction in CASP14. Bioinformatics 37(19), 31903196.CrossRefGoogle Scholar
Haslam, D, Sazzed, S, Wriggers, W, Kovcas, J, Song, J, Auer, M and He, J (2018) A pattern recognition tool for medium-resolution cryo-EM density maps and low-resolution cryo-ET density maps. In Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) International Symposium on Bioinformatics Research and Applications, pp. 233238. Springer, Cham. Beijing, China.Google Scholar
He, J, Lin, P, Chen, J, Cao, H and Huang, S-Y (2022) Model building of protein complexes from intermediate-resolution cryo-EM maps with deep learning-guided automatic assembly. Nature Communications 13(1), 116.CrossRefGoogle ScholarPubMed
Hou, J, Wu, T, Cao, R and Cheng, J (2019) Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins: Structure, Function, and Bioinformatics 87(12), 11651178.CrossRefGoogle ScholarPubMed
Hou, J, Wu, T, Guo, Z, Quadir, F and Cheng, J (2020) The MULTICOM protein structure prediction server empowered by deep learning and contact distance prediction. Methods in Molecular Biology 2165, 1326.CrossRefGoogle ScholarPubMed
Jiang, W, Baker, ML, Ludtke, SJ and Chiu, W (2001) Bridging the information gap: Computational tools for intermediate resolution structure interpretation. Journal of Molecular Biology 308(5), 10331044.Google ScholarPubMed
Jumper, J, Evans, R, Pritzel, A, Green, T, Figurnov, M, Ronneberger, O, Tunyasuvunakool, K, Bates, R, Žídek, A and Potapenko, A (2021a) Applying and improving AlphaFold at CASP14. Proteins: Structure, Function, and Bioinformatics 89(12), 17111721.CrossRefGoogle Scholar
Jumper, J, Evans, R, Pritzel, A, Green, T, Figurnov, M, Ronneberger, O, Tunyasuvunakool, K, Bates, R, Žídek, A and Potapenko, A (2021b) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583589.CrossRefGoogle Scholar
Katsevich, E, Katsevich, A and Singer, A (2015) Covariance matrix estimation for the cryo-EM heterogeneity problem. SIAM Journal on Imaging Sciences 8(1), 126185.CrossRefGoogle ScholarPubMed
Kovacs, JA, Galkin, VE and Wriggers, W (2018) Accurate flexible refinement of atomic models against medium-resolution cryo-EM maps using damped dynamics. BMC Structural Biology 18(1), 111.CrossRefGoogle ScholarPubMed
Kryshtafovych, A, Schwede, T, Topf, M, Fidelis, K and Moult, J (2021) Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins: Structure, Function, and Bioinformatics 89(12), 16071617.CrossRefGoogle ScholarPubMed
Leschziner, AE and Nogales, E (2007) Visualizing flexibility at molecular resolution: Analysis of heterogeneity in single-particle electron microscopy reconstructions. Annual Review of Biophysics and Biomolecular Structure 36, 4362.CrossRefGoogle ScholarPubMed
Li, R, Si, D, Zeng, T, Ji, S and He, J (2016) Deep convolutional neural networks for detecting secondary structures in protein density maps from cryo-electron microscopy. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 4146. IEEE. Shenzhen, China.CrossRefGoogle Scholar
Li, Y, Zhang, C, Bell, EW, Zheng, W, Zhou, X, Yu, D-J and Zhang, Y (2021a) Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Computational Biology 17(3), e1008865.CrossRefGoogle Scholar
Li, Y, Zhang, C, Zheng, W, Zhou, X, Bell, EW, Yu, DJ and Zhang, Y (2021b) Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins: Structure, Function, and Bioinformatics 89, 19111921.CrossRefGoogle Scholar
Lindert, S, Alexander, N, Wotzel, N, Karaka, M, Stewart, PL and Meiler, JEM-F (2012) De novo atomic-detail protein structure determination from medium-resolution density maps. Structure 20(3), 464478.Google ScholarPubMed
Ludtke, SJ, Baldwin, PR and Chiu, W (1999) EMAN: Semiautomated software for high-resolution single-particle reconstructions. Journal of Structural Biology 128(1), 8297.CrossRefGoogle ScholarPubMed
Lyumkis, D (2019) Challenges and opportunities in cryo-EM single-particle analysis. Journal of Biological Chemistry 294(13), 51815197.CrossRefGoogle ScholarPubMed
Maddhuri Venkata Subramaniya, SR, Terashi, G and Kihara, D (2019) Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nature Methods 16(9), 911917.CrossRefGoogle ScholarPubMed
Malhotra, S, Träger, S, Dal Peraro, M and Topf, M (2019) Modelling structures in cryo-EM maps. Current Opinion in Structural Biology 58, 105114.CrossRefGoogle ScholarPubMed
Mariani, V, Biasini, M, Barbato, A and Schwede, T (2013) lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29(21), 27222728.CrossRefGoogle ScholarPubMed
Martz, E (n.d.) GDT_TS definition at the CASP 14 website. Available at https://proteopedia.org/wiki/index.php/Calculating_GDT_TS (accessed 17 May 2022).Google Scholar
Méndez, J, Garduno, E, Carazo, JM and Sorzano, COS (2021) Identification of incorrectly oriented particles in cryo-EM single particle analysis. Journal of Structural Biology 213(3), 107771.CrossRefGoogle ScholarPubMed
Mirdita, M, Schütze, K, Moriwaki, Y, Heo, L, Ovchinnikov, S and Steinegger, M (2021) ColabFold-making protein folding accessible to all. Nature Methods 19, 679682.CrossRefGoogle Scholar
Mu, Y, Sazzed, S, Alshammari, M, Sun, J and He, J (2021) A tool for segmentation of secondary structures in 3D cryo-EM density map components using deep convolutional neural networks. Frontiers in Bioinformatics 1, 51.CrossRefGoogle Scholar
Naydenova, K and Russo, CJ (2017) Measuring the effects of particle orientation to improve the efficiency of electron cryomicroscopy. Nature Communications 8(1), 15.CrossRefGoogle ScholarPubMed
Pereira, J, Simpkin, AJ, Hartmann, MD, Rigden, DJ, Keegan, RM and Lupas, AN (2021) High-accuracy protein structure prediction in CASP14. Proteins: Structure, Function, and Bioinformatics 89(12), 16871699.CrossRefGoogle ScholarPubMed
Pettersen, EF, Goddard, TD, Huang, CC, Couch, GS, Greenblatt, DM, Meng, EC and Ferrin, TE (2004) UCSF chimera—A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25(13), 16051612.CrossRefGoogle ScholarPubMed
Pettersen, EF, Goddard, TD, Huang, CC, Meng, EC, Couch, GS, Croll, TI, Morris, JH and Ferrin, TE (2021) UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Science 30(1), 7082.CrossRefGoogle ScholarPubMed
Punjani, A and Fleet, DJ (2021) 3D variability analysis: Resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. Journal of Structural Biology 213(2), 107702.CrossRefGoogle ScholarPubMed
Sazzed, S, Scheible, P, Alshammari, M, Wriggers, W and He, J (2020) Cylindrical similarity measurement for helices in medium-resolution Cryo-electron microscopy density maps. Journal of Chemical Information and Modeling 60(5), 26442650.CrossRefGoogle ScholarPubMed
Shen, T, Wu, J, Lan, H, Zheng, L, Pei, J, Wang, S, Liu, W and Huang, J (2021) When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14. Proteins: Structure, Function, and Bioinformatics 89, 19011910.CrossRefGoogle ScholarPubMed
Si, D and He, J (2013) Beta-sheet detection and representation from medium resolution cryo-EM density maps. In BCB’13: Proceedings of ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, pp. 764770. ACM. Washington, DC, USA.CrossRefGoogle Scholar
Stephanie, Z, Julio, K, Willy, W and Jing, H (2017) Comparing an atomic model or structure to a corresponding cryo-electron microscopy image at the central Axis of a helix. Journal of Computational Biology 24(1), 5267.Google Scholar
Swint-Kruse, L and Brown, CS (2005) Resmap: Automated representation of macromolecular interfaces as two-dimensional networks. Bioinformatics 21(15), 33273328.Google ScholarPubMed
Tama, F, Wriggers, W and Brooks, CL (2002) Exploring global distortions of biological macromolecules and assemblies from low-resolution structural information and elastic network theory. Journal of Molecular Biology 321(2), 297305.Google ScholarPubMed
Terashi, G and Kihara, D (2018) De novo main-chain modeling for EM maps using MAINMAST. Nature Communications 9(1), 111.CrossRefGoogle ScholarPubMed
Terwilliger, TC (n.d.) Rebuilding docked processed AlphaFold2 and other predicted models with a cryo-EM map. Available at https://phenix-online.org/documentation/reference/rebuild_predicted_model.html (accessed 16 May 2022).Google Scholar
Terwilliger, TC, Poon, BK, Afonine, P, Schlicksup, CJ, Croll, TI, Millan-Nebot, C, Richardson, JS, Read, RJ and Adams, PD (2022) Improving AlphaFold modeling using implicit information from experimental density maps. bioRxiv.CrossRefGoogle Scholar
Thomas, C (n.d.) AlphaFold and Phenix. Available at https://phenix-online.org/documentation/reference/alphafold.html (accessed 16 May 2022).Google Scholar
Topf, M, Baker, ML, Marti-Renom, MA, Chiu, W and Sali, A (2006) Refinement of protein structures by iterative comparative modeling and CryoEM density fitting. Journal of Molecular Biology 357(5), 16551668.CrossRefGoogle ScholarPubMed
Vilas, JL, Gómez-Blanco, J, Conesa, P, Melero, R, Miguel de la Rosa-Trevín, J, Otón, J, Cuenca, J, Marabini, R, Carazo, JM, Vargas, J and Sorzano, COS (2018) MonoRes: Automatic and accurate estimation of local resolution for electron microscopy maps. Structure 26(2), 337344.CrossRefGoogle ScholarPubMed
Wang, X, Alnabati, E, Aderinwale, TW, Subramaniya, SRMV, Terashi, G and Kihara, D (2021) Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nature Communications 12(1), 19.Google ScholarPubMed
Wang, RY-R, Kudryashev, M, Li, X, Egelman, EH, Basler, M, Cheng, Y, Baker, D and DiMaio, F (2015) De novo protein structure determination from near-atomic-resolution cryo-EM maps. Nature Methods 12(4), 335338.Google ScholarPubMed
Wriggers, W (2010) Using situs for the integration of multi-resolution structures. Biophysical Reviews 2(1), 2127.CrossRefGoogle ScholarPubMed
Wriggers, W (2012) Conventions and workflows for using situs. Acta Crystallographica Section D: Biological Crystallography 68(4), 344351.CrossRefGoogle ScholarPubMed
Wriggers, W, Agrawal, RK, Drew, DL, McCammon, A and Frank, J (2000) Domain motions of EF-G bound to the 70S ribosome: Insights from a hand-shaking between multi-resolution structures. Biophysical Journal 79(3), 16701678.Google Scholar
Wriggers, W and He, J (2015) Numerical geometry of map and model assessment. Journal of Structural Biology 192(2), 255261.CrossRefGoogle ScholarPubMed
Wriggers, W, Milligan, RA, Schulten, K and McCammon, JA (1998) Self-organizing neural networks bridge the biomolecular resolution gap. Journal of Molecular Biology 284(5), 12471254.CrossRefGoogle ScholarPubMed
Wu, T, Liu, J, Guo, Z, Hou, J and Cheng, J (2021) MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. Scientific Reports 11(1), 19.Google ScholarPubMed
Xu, J (2019) Distance-based protein folding powered by deep learning. Proceedings of the National Academy of Sciences 116(34), 1685616865.Google ScholarPubMed
Xu, J, Mcpartlon, M and Li, J (2021) Improved protein structure prediction by deep learning irrespective of co-evolution information. Nature Machine Intelligence 3(7), 601609.CrossRefGoogle ScholarPubMed
Yang, J, Anishchenko, I, Park, H, Peng, Z, Ovchinnikov, S and Baker, D (2020) Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117(3), 14961503.CrossRefGoogle ScholarPubMed
Zhang, Y and Skolnick, J (2005) TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33(7), 23022309.CrossRefGoogle ScholarPubMed
Zhang, X, Zhang, B, Freddolino, PL and Zhang, Y (2022) CR-I-TASSER: Assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nature Methods 19(2), 195204.CrossRefGoogle ScholarPubMed
Zhang, B, Zhang, X, Pearce, R, Shen, H-B and Zhang, Y (2020) A new protocol for atomic-level protein structure modeling and refinement using low-to-medium resolution cryo-EM density maps. Journal of Molecular Biology 432(19), 53655377.CrossRefGoogle ScholarPubMed
Zheng, W, Li, Y, Zhang, C, Pearce, R, Mortuza, S and Zhang, Y (2019) Deep-learning contact-map guided protein structure prediction in CASP13. Proteins: Structure, Function, and Bioinformatics 87(12), 11491164.CrossRefGoogle ScholarPubMed
Zheng, W, Li, Y, Zhang, C, Zhou, X, Pearce, R, Bell, EW, Huang, X and Zhang, Y (2021) Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins: Structure, Function, and Bioinformatics 89(12), 17341751.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Accuracy of models before and after refinement using high-resolution cryo-EM maps and hybrid density maps of 5, 6, and 8 Å resolutions

Figure 1

Fig. 1. Models obtained from AlphaFold2 and the refinements using Cryo-EM map 23274–7LCI-R (EMDB-PDB-chain ID) and hybrid density maps at 6 and 8 Å resolutions. (a) Superposition of the protein structure (red, chain R of 7LCI) and the model obtained from AlphaFold2. (b1) The box-cropped region of cryo-EM map 23274 (EMDB ID, cyan) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red, chain R of 7LCI) and the refined model (blue) using Phenix and the box-cropped cryo-EM map in b1. Hybrid density maps of 6 Å (grey in c1) and 8 Å (yellow in d1) resolutions are superimposed with the corresponding models refined from the maps, respectively. The 6 Å-map-refined model (Cyan ribbon in c1, c2) and 8 Å-map-refined model (green in d1, d2) are superimposed with the structure (red) in c2 and d2. The superposition of two atomic models was performed with TM-align (Zhang and Skolnick, 2005) in all figures. The superposition of a density map and a model was performed using Phenix (Terwilliger et al.,2022) in all figures. An example of a weaker density region in the cryo-EM map and in the corresponding hybrid maps is indicated by an ellipse in b1, c1, and d1.

Figure 2

Fig. 2. Models obtained from AlphaFold2 and the refinements using Cryo-EM map 12183–7BGL-A-T1047S1D1 (EMDB-PDB-chain ID-CASP14 target ID) and hybrid density maps at 6 and 8 Å resolutions. (a) Superposition of the protein structure (red, chain A of 7BGL) and the model obtained from AlphaFold2 (yellow). This chain is one of the free modelling targets in CASP14 with ID T1047S1D1. (b1) The box-cropped region of cryo-EM map 12183 (EMDB ID, cyan) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red, chain A of 7BGL) and the refined model (blue) using Phenix and the box-cropped cryo-EM map in b1. Hybrid maps of 6 Å (grey in c1) and 8 Å (yellow in d1) resolutions are superimposed with the model refined from the corresponding map. The 6 Å-map-refined model (Cyan ribbon in c1, c2) and 8 Å-map-refined model (green in d1, d2) are superimposed with the structure (red) in c2 and d2.

Figure 3

Fig. 3. Predicted models using AlphaFold2 for 7LV9-B and 7L6U-A (PDB ID–Chain ID). The structures (red) and models predicted using AlphaFold2 (yellow) are superimposed for chain B of 7LV9 (a) and chain A of 7L6U (b). See the Supplementary Material for more details about the two cases.

Figure 4

Fig. 4. Accuracy of models measured using TM-align. The TM score of each model was calculated against the protein structure downloaded from the PDB for 13 cases. In each case, the accuracy is shown from left to right for the model obtained using AlphaFold2 (black), refinement using Phenix and cryo-EM maps (green), refinement using hybrid map of 5 Å (red), 6 Å (blue), 8 Å (yellow), 10 Å (grey), and 12 Å (light blue) resolutions.

Figure 5

Fig. 5. The intersecondary structure geometry for long helices in the predicted and refined models of 7KZZ chain B. (a1) The superposition of the protein structure (red, chain B of 7KZZ) and the model obtained from AlphaFold2 (yellow). (a2) Secondary structures of those superimposed models in a1 are represented by their central axes using AxisComparison (Haslam et al.,2018); The central axes of helices (red) and beta-strand (green) in the structure; the central axes of helices (yellow) and beta-strands (black) in the model obtained from AlphaFold2. (a3) The axes of three consecutive long helices (H5, H6, and H7) of the structure are overlaid with the corresponding axes of three helices (H4, H5, and H6) of the model using two vectors, the vector of the central axes between Trp168 and Ala196, and the vector of the turn between Trp168 and Tyr165. Amino acids are labelled at the start and end of a helix. (b1) The box-cropped region of cryo-EM map 23093 (EMDB ID, yellow) superimposed with the model (blue) refined using Phenix and the cryo-EM map. (b2) Superposition of the structure (red) and the refined model (blue) obtained using Phenix and the box-cropped cryo-EM map in b1. (c1) Box-cropped hybrid density map of 6 Å resolution (grey in c1) superimposed with the model refined from it. (c2) Superposition of the structure (red) and the model (cyan) refined using Phenix and the box-cropped cryo-EM map of 6 Å resolution. Annotation of secondary structures and molecular graphics was conducted with ChimeraX (Pettersen et al.,2021).

Figure 6

Fig. 6. Secondary structure regions detected from the box-cropped hybrid map of 8 Å resolution for 23274–7LCI-R (EMDB-PDB-chain ID). The helix (yellow) and β-sheet (cyan) regions in (a) and (b) were segmented from the hybrid map at 8 Å using the DeepSSETracer (Mu et al.,2021). The model obtained from AlphaFold2 is coloured by the secondary structure type for helices (orange), β-sheet (cyan), coil (grey) in (c, b), and superimposed in (b). The atomic structure (red) and the model refined (green) from the hybrid map are shown in (d) and (e), respectively.

Supplementary material: PDF

Alshammari et al. supplementary material

Alshammari et al. supplementary material

Download Alshammari et al. supplementary material(PDF)
PDF 1.3 MB

Review: Refinement of AlphaFold2 Models against Experimental and Hybrid Cryo-EM Density Maps — R0/PR1

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: Alshammari et al. set to test whether AlphaFold2 models can be refined in the medium resolution cryo-EM maps (5-12 A). The test follows a recent work by Terwilliger et al. who implemented a refinement protocol in the Phenix program, which successfully refines AlphaFold2 models in high-resolution maps (better than 4.5 A). The Phenix protocol applies an iterative procedure of four cycles, in each AlphaFold2 models are fitted to the cryo-EM map, refined using a Phenix refinement, and then the resulting refined models are fed back to AlphaFold2 for another round of modeling. For their test, Alshammari et al. have run a single cycle of the procedure. To address the limited number of accurate structures for low-resolution EM maps, the authors used a good idea of "hybrid density maps", which are high-resolution maps blurred with a Gaussian, lowering the resolution but, supposedly (see below), preserving some of the noise and imperfections present in the original maps. Altogether, they found that the single cycle of the Phenix refinement can improve some models up to 6 A resolution. They explore examples and point out some of the specific reasons why refinement can fail.

The manuscript reads well and the study has been overall well conduced (but see below). For detailed claims like 4 A vs 6A resolution limit, one would like to see a bigger benchmark that 13 cases.

In contrast to the original Phenix procedure, where refined intermediate models are fed back to Alphafold2 so that the EM map and AlphaFold2 work together to improve the models, here only one cycle is run. With one iteration - so no moving back to AlphaFold - the authors test only the Phenix refinement procedure but not new procedure by Terwilliger et al. With four cycles, results might be even better pushing the resolution limit beyond 6 A or increasing success rate at 5-6 A. That said, this would not change the conclusions of this study, so repeating with four cycles does not seem necessary.

Overall, while the study represents merely a test of an existing method (or one fourth of it) on a small benchmark, the addressed question is important for the field and the manuscript can be useful and timely reference, and can inspire new developments.

Major comments:

- The authors simulate densities by first converting a high-resolution map to pseudo-atomic beads using vol2pdb and then convert the beads to a map at desired resolution. Why not blurring the map directly with a Gaussian? The authors say that their procedure "incorporates any quality variation within the parent high-resolution cryo-EM map into the hybrid map, resulting in a more realistic low-resolution density model" but are those "quality variations" preserved in the intermediate bead model? It must be tested or explained that the intermediate bead models preserve "quality variations" or those statements should be removed from the manuscript.

- I am not sure from the provided description whether the blurring with the intermediate bead models produces good estimation of the resolution of the resulting maps - is 6A from this "bead" procedure equivalent to a 6A map blurred directly? This equivalence must be demonstrated if authors want to claim absolute thresholds. Or, the work could be repeated with maps blurred directly, which might be better in general.

Minor comments:

- Abstract: "Resolutions better than 4.5A were reached" - resolution of models? What does this sentence mean?

- Introduction: The authors list how many PDB structures have been deposited in different resolution ranges but it would be much more informative to assess the impact of their work if they list how many cryoEM maps were deposited in these ranges, even if, or especially if, the PDB model was not deposited.

- In 2.2, the (letters?) in the formulas rendered as squares in my PDF, I assume they are fine and I could understand them, but should be made sure by the editors that it is corrected.

- Table 1: could you add columns with average pLDDT and pTM for comparison. Does the success correlate with pLDDT and pTM as it does with TM? It would be useful for the readers to assess to what extent they can rely on the predicted scores in assessing whether the refinement might succeed.

Figure 3 - why the EM map and refined model are not shown like in the other figures?

Review: Refinement of AlphaFold2 Models against Experimental and Hybrid Cryo-EM Density Maps — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: In the presented work, the authors have reported the utility of deep learning-based protein structure prediction approaches, precisely of AlphaFold2, in the refinement of medium-to-low resolution cryo-EM maps. This work thoroughly discussed the dependencies, as well as the strengths and limitations, of integrating the experimental & the AI-based approach to building accurate models even from not-so-well resolved density maps (even as low as 6Å). Interestingly, the authors have suggested that refinement using medium-to-low resolution cryo maps might be beneficial towards escaping the risk of conformational trapping at local minima. In doing so, they have proposed a new approach to generate low-resolution maps from high-resolution experimentally reported maps instead of generating maps from atomic structures as has been traditionally done. Overall, this work presents a insightful discussion for the community actively investigating efficient approaches to mitigate the scarcity of high-resolution structural models.

However, a few significant concerns are not clarified in the present manuscript.

1. (i) Here the authors have proposed "a novel hybrid experimental-simulated density map" generation approach to get low-resolution maps from the high-resolution experimental map as the basis instead of using atomic models. However, the manuscript states "the high-resolution cryo-EM map was first converted to an intermediate PDB file using the Situs vol2pdb tool, with each voxel represented as a pseudoatom. Each pseudoatom was then convoluted with a Gaussian filter using a modified version of pdb2vol, with a filter size determined by the desired resolution of the hybrid map". So, here also the high-resolution cryo maps are first converted to an atomic model and then converted back to density maps of desired resolution. How is this different than generating a density map directly from an atomistic model as has been practiced traditionally? The benefits of the newly proposed hybrid approach, as has been claimed in the present manuscript, are unclear by the current writing.

(ii) Unavailability of notations in section 2.2 "Hybrid Experimental-Simulated Density Maps", makes it hard to understand the current implementation. Please elaborate as required.

2. (i) The present work has reported that the benefits of AlphaFold2 can still be obtained in the "twilight zone" (4-6Å) of experimentally resolved maps. However, refining low-resolution maps with satisfactory accuracy is not achieved. A comment on how this problem can be handled whether by integrating additional experimental data or by some other hybrid statistical approaches would be an interesting discussion.

(ii) It seems that for 8Å structures, without density refined AF2 model qualities are better than refined ones. Can authors comment?

There are a few minor concerns listed below:

1. In the method section, the last paragraph explains the generation of the density map for refinement. However, the write-up is more like a reference to the Phenix documentation rather than a comprehensive general description of the workflow. A more inclusive approach would be much appreciated by a broader/curious audience.

2. "... a more exhaustive sampling of conformations might be required, and that lowering resolution could be part of an annealing strategy to escape local traps. This is yet another argument as to why it could make sense to develop a lower resolution refinement strategy even for high-resolution maps". The motivation of refining an AF2 model with low-resolution maps even though high-resolution maps are available is better to be introduced or emphasized in the beginning as a clear motivation for the present work.

3. Please make the reference list complete and consistent.

Recommendation: Refinement of AlphaFold2 Models against Experimental and Hybrid Cryo-EM Density Maps — R0/PR3

Comments

Comments to Author: Reviewer #1: Alshammari et al. set to test whether AlphaFold2 models can be refined in the medium resolution cryo-EM maps (5-12 A). The test follows a recent work by Terwilliger et al. who implemented a refinement protocol in the Phenix program, which successfully refines AlphaFold2 models in high-resolution maps (better than 4.5 A). The Phenix protocol applies an iterative procedure of four cycles, in each AlphaFold2 models are fitted to the cryo-EM map, refined using a Phenix refinement, and then the resulting refined models are fed back to AlphaFold2 for another round of modeling. For their test, Alshammari et al. have run a single cycle of the procedure. To address the limited number of accurate structures for low-resolution EM maps, the authors used a good idea of "hybrid density maps", which are high-resolution maps blurred with a Gaussian, lowering the resolution but, supposedly (see below), preserving some of the noise and imperfections present in the original maps. Altogether, they found that the single cycle of the Phenix refinement can improve some models up to 6 A resolution. They explore examples and point out some of the specific reasons why refinement can fail.

The manuscript reads well and the study has been overall well conduced (but see below). For detailed claims like 4 A vs 6A resolution limit, one would like to see a bigger benchmark that 13 cases.

In contrast to the original Phenix procedure, where refined intermediate models are fed back to Alphafold2 so that the EM map and AlphaFold2 work together to improve the models, here only one cycle is run. With one iteration - so no moving back to AlphaFold - the authors test only the Phenix refinement procedure but not new procedure by Terwilliger et al. With four cycles, results might be even better pushing the resolution limit beyond 6 A or increasing success rate at 5-6 A. That said, this would not change the conclusions of this study, so repeating with four cycles does not seem necessary.

Overall, while the study represents merely a test of an existing method (or one fourth of it) on a small benchmark, the addressed question is important for the field and the manuscript can be useful and timely reference, and can inspire new developments.

Major comments:

- The authors simulate densities by first converting a high-resolution map to pseudo-atomic beads using vol2pdb and then convert the beads to a map at desired resolution. Why not blurring the map directly with a Gaussian? The authors say that their procedure "incorporates any quality variation within the parent high-resolution cryo-EM map into the hybrid map, resulting in a more realistic low-resolution density model" but are those "quality variations" preserved in the intermediate bead model? It must be tested or explained that the intermediate bead models preserve "quality variations" or those statements should be removed from the manuscript.

- I am not sure from the provided description whether the blurring with the intermediate bead models produces good estimation of the resolution of the resulting maps - is 6A from this "bead" procedure equivalent to a 6A map blurred directly? This equivalence must be demonstrated if authors want to claim absolute thresholds. Or, the work could be repeated with maps blurred directly, which might be better in general.

Minor comments:

- Abstract: "Resolutions better than 4.5A were reached" - resolution of models? What does this sentence mean?

- Introduction: The authors list how many PDB structures have been deposited in different resolution ranges but it would be much more informative to assess the impact of their work if they list how many cryoEM maps were deposited in these ranges, even if, or especially if, the PDB model was not deposited.

- In 2.2, the (letters?) in the formulas rendered as squares in my PDF, I assume they are fine and I could understand them, but should be made sure by the editors that it is corrected.

- Table 1: could you add columns with average pLDDT and pTM for comparison. Does the success correlate with pLDDT and pTM as it does with TM? It would be useful for the readers to assess to what extent they can rely on the predicted scores in assessing whether the refinement might succeed.

Figure 3 - why the EM map and refined model are not shown like in the other figures?

Reviewer #3: In the presented work, the authors have reported the utility of deep learning-based protein structure prediction approaches, precisely of AlphaFold2, in the refinement of medium-to-low resolution cryo-EM maps. This work thoroughly discussed the dependencies, as well as the strengths and limitations, of integrating the experimental & the AI-based approach to building accurate models even from not-so-well resolved density maps (even as low as 6Å). Interestingly, the authors have suggested that refinement using medium-to-low resolution cryo maps might be beneficial towards escaping the risk of conformational trapping at local minima. In doing so, they have proposed a new approach to generate low-resolution maps from high-resolution experimentally reported maps instead of generating maps from atomic structures as has been traditionally done. Overall, this work presents a insightful discussion for the community actively investigating efficient approaches to mitigate the scarcity of high-resolution structural models.

However, a few significant concerns are not clarified in the present manuscript.

1. (i) Here the authors have proposed "a novel hybrid experimental-simulated density map" generation approach to get low-resolution maps from the high-resolution experimental map as the basis instead of using atomic models. However, the manuscript states "the high-resolution cryo-EM map was first converted to an intermediate PDB file using the Situs vol2pdb tool, with each voxel represented as a pseudoatom. Each pseudoatom was then convoluted with a Gaussian filter using a modified version of pdb2vol, with a filter size determined by the desired resolution of the hybrid map". So, here also the high-resolution cryo maps are first converted to an atomic model and then converted back to density maps of desired resolution. How is this different than generating a density map directly from an atomistic model as has been practiced traditionally? The benefits of the newly proposed hybrid approach, as has been claimed in the present manuscript, are unclear by the current writing.

(ii) Unavailability of notations in section 2.2 "Hybrid Experimental-Simulated Density Maps", makes it hard to understand the current implementation. Please elaborate as required.

2. (i) The present work has reported that the benefits of AlphaFold2 can still be obtained in the "twilight zone" (4-6Å) of experimentally resolved maps. However, refining low-resolution maps with satisfactory accuracy is not achieved. A comment on how this problem can be handled whether by integrating additional experimental data or by some other hybrid statistical approaches would be an interesting discussion.

(ii) It seems that for 8Å structures, without density refined AF2 model qualities are better than refined ones. Can authors comment?

There are a few minor concerns listed below:

1. In the method section, the last paragraph explains the generation of the density map for refinement. However, the write-up is more like a reference to the Phenix documentation rather than a comprehensive general description of the workflow. A more inclusive approach would be much appreciated by a broader/curious audience.

2. "... a more exhaustive sampling of conformations might be required, and that lowering resolution could be part of an annealing strategy to escape local traps. This is yet another argument as to why it could make sense to develop a lower resolution refinement strategy even for high-resolution maps". The motivation of refining an AF2 model with low-resolution maps even though high-resolution maps are available is better to be introduced or emphasized in the beginning as a clear motivation for the present work.

3. Please make the reference list complete and consistent.

Recommendation: Refinement of AlphaFold2 Models against Experimental and Hybrid Cryo-EM Density Maps — R1/PR4

Comments

No accompanying comment.

Recommendation: Refinement of AlphaFold2 Models against Experimental and Hybrid Cryo-EM Density Maps — R2/PR5

Comments

No accompanying comment.