Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 March 2023

Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

  • Anthony C. Bishop   ORCID: orcid.org/0000-0003-4853-1073 1 ,
  • Glorisé Torres-Montalvo   ORCID: orcid.org/0000-0001-5240-6000 1 ,
  • Sravya Kotaru   ORCID: orcid.org/0000-0002-5632-8190 2 ,
  • Kyle Mimun   ORCID: orcid.org/0000-0003-0060-4149 1 &
  • A. Joshua Wand   ORCID: orcid.org/0000-0001-8341-0782 1 , 2 , 3 , 4  

Nature Communications volume  14 , Article number:  1556 ( 2023 ) Cite this article

3206 Accesses

1 Citations

3 Altmetric

Metrics details

  • Molecular conformation
  • Solution-state NMR

Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often remains a bottleneck in the exploitation of solution NMR spectroscopy for the study of protein structure-dynamics-function relationships. We present an approach to the assignment of backbone triple resonance spectra of proteins. A Bayesian statistical analysis of predicted and observed chemical shifts is used in conjunction with inter-spin connectivities provided by triple resonance spectroscopy to calculate a pseudo-energy potential that drives a simulated annealing search for the most optimal set of resonance assignments. Termed Bayesian Assisted Assignments by Simulated Annealing (BARASA), a C++ program implementation is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins. BARASA is fast, robust, accommodates incomplete and incorrect information, and outperforms current algorithms – especially in cases of sparse data and is sufficiently fast to allow for real-time evaluation during data acquisition.

Similar content being viewed by others

nmr backbone assignment

Backbone-independent NMR resonance assignments of methyl probes in large proteins

nmr backbone assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

nmr backbone assignment

Extended experimental inferential structure determination method in determining the structural ensembles of disordered protein states

Introduction.

Nuclear magnetic resonance (NMR) spectroscopy is unique in its ability to provide simultaneous and comprehensive structural and dynamical atomic-scale information about macromolecules such as proteins in solution 1 , 2 , 3 , 4 . Unfortunately, however, an observed resonance frequency in an NMR spectrum cannot yet be directly assigned to the individual atom(s) within the protein from which they arise without the time-intensive collection and analysis of additional spectra. Comprehensive mapping of individual resonances comprising nuclear magnetic resonance (NMR) spectra to specific atoms within a protein molecule is a general prerequisite for the successful analysis of the structure and dynamics of proteins by NMR spectroscopy. Early applications of multi-dimensional homonuclear 1 H NMR data to the so-called resonance assignment problem relied heavily on human intervention. The first comprehensive approach was the sequential assignment method, which centered on identification of J-coupled spin systems 5 that are then assembled through connections provided by short distances revealed by the nuclear Overhauser effect (NOE) interactions between sequential residues using the identity of side chains to error-check against the primary structure 6 , 7 . The subsequent main chain directed (MCD) assignment strategy 8 , 9 formalized self-correcting cyclic patterns of backbone 1 H- 1 H NOE interactions and provided a more robust algorithmic framework that relieved somewhat the complexity of identifying side chain resonances 10 , 11 . While the MCD approach did lead to the first fully automated assignment of 1 H resonances to backbone hydrogens 11 , automation of 1 H-based resonance assignments was generally frustrated by the overwhelming spectral degeneracy of multidimensional 1 H spectra of proteins and the interference of technical attributes such as a prominent diagonal. The introduction of heteronuclear triple resonance spectroscopy 12 , 13 , 14 , 15 , 16 , 17 completely changed the landscape of the resonance assignment task by providing much greater resolution, generally higher quality data, and, most importantly, definitive rules with very precise meanings for making connectivities (correlations) between backbone resonances. Triple resonance assignments of the protein backbone permit access, either directly or by tethering to side chain resonance assignments, to a wide range of dynamic phenomena 17 , 18 and structural information 19 , 20 , 21 .

Automated triple resonance algorithms have led to effectively complete backbone resonance assignments of smaller proteins with little human intervention and greatly aided the assignment of larger systems 22 , 23 , 24 . Yet, even with the advent of transverse relaxation optimized spectroscopy (TROSY) 25 , the comprehensive assignment of systems larger than 30 kDa remains remarkably rare. The limitations are quite analogous to that summarized for earlier assignment strategies based exclusively on 1 H- 1 H scalar and NOE interactions: increasing ambiguity in connectivities due to degeneracy, loss of resonances due to relaxation or artifact, and other confounding spectral attributes are simply not sufficiently accommodated by current automated assignment strategies.

Here, we strive to overcome the issue of data sparseness and ambiguity by appealing to the statistics of Bayes to utilize available information more effectively via the calculation of explicit probabilities. Importantly, this formalism also allows for a flexible and adaptable incorporation of chemical shift prediction and structural knowledge into the assignment process. By implementing the Bayesian analysis within a simulated annealing engine, we develop a robust and efficient search for optimal solutions. Protein assignment algorithms utilizing simulated annealing have been developed in the past 26 . However, the stochastic algorithm described here takes advantage of readily available pre-existing structural models, both experimentally-determined and predicted, and in doing so more effectively exploits the rich information contained within structure-based predicted chemical shifts. We demonstrate how these invaluable restraints greatly aid the resonance assignment process, especially in cases where data may be otherwise sparse or even incorrect. We also compare the overall performance of BARASA against three highly cited assignment algorithms on a variety of experimental datasets.

Results and discussion

Bayesian assisted resonance assignments by simulated annealing (barasa).

We designed an algorithm, termed BARASA, which utilizes a simulated annealing approach 27 to efficiently search the immense solution space for the optimal set of resonance assignments starting with a set of raw crosspeaks derived from triple resonance type spectra. The objective is to find the correct mapping of individual resonances to specific atoms within the protein molecule. The algorithm first assembles an initial set of spin systems based on an analysis of crosspeak lists and the connectivity rules of the particular triple resonance experiments employed. This process may not yield an unambiguous nor complete set of spin systems due to inherent degeneracy and missing or artifactual peaks (See Methods). As a result, a given crosspeak could be associated with multiple, spectrally-overlapping spin systems; in which case, the crosspeak is randomly placed in one of the overlapping spin systems. The simulated annealing search engine then randomly distributes the starting set of spin systems to specific residue positions. If there are more spin systems than residue positions, then the excess spin systems are placed in a cache for later use as described below. The energy of this initial state is calculated as the sum of the energies of the individual spin systems currently placed in residue positions. Each spin system energy is composed of two terms: the adjacency energy and the chemical shift energy. The adjacency energy describes the interaction between two spin systems mapped to adjacent locations on the amino acid sequence. This energy is minimized if the Cα(i), Cβ(i), and C’(i) shifts of the spin system match the Cα(i-1), Cβ(i-1), and C’(i-1) of the spin system at the following residue in the sequence. In contrast, the chemical shift energy describes the interaction between a spin system and its current residue position i.e., it is defined by the local sequence and structure. This energy is minimized when the resonances of the spin system closely match the predicted values of the current residue position, while also failing to match the predicted values at all other residue positions. Application of Bayes’ theorem then provides a posterior probability of assigning each spin system at each location in the sequence that is based on the predicted and experimental shifts. Using this probability, the chemical shift energy is calculated (see Methods for a more detailed description). After the initial calculation of energy, a spin system or individual crosspeak is randomly chosen. A spin system is either moved to an unoccupied residue position, swapped with another spin system, or added to the cache. Spin systems or cross peaks deposited to their respective caches have no priority and are randomly selected from the cache. Similarly, if a chosen crosspeak can be productively added to the crosspeak cache, swapped with another crosspeak in an overlapping spin system, or moved to an overlapping spin system, the move is made. With every crosspeak/spin system swap, the decision to accept the proposed move is made based on the energy of the system before and after the proposed swap. Using an effective temperature T, the Metropolis criterion 28 is applied (Eq.  1 ).

\({P}_{{{{{\rm{accept}}}}}}\) is the probability of accepting the swap and \(\varDelta E\) is the change in energy due to the proposed swap. If \(\varDelta E\le 0\) then \({P}_{{{{{\rm{accept}}}}}}\) is set to 1. If \(\varDelta E\) > 0, then \(0 < \,{P}_{{{{{\rm{accept}}}}}} < 1\) and a uniformly distributed random number r such that \(0\le {r}\le 1\) is generated. If \(r\le \,{P}_{{{{{\rm{accept}}}}}}\) then the swap is accepted. Otherwise, the swap is rejected and the system state is left unchanged. Random swap attempts are continued until the average energy of system does not vary significantly. \(T\) is then decreased by following a highly optimized schedule based on a quantity analogous to the specific heat of the system (see Methods). The system is further cooled and equilibrated in this manner until a set of termination criteria are achieved and the annealing protocol is ended. Finally, to ensure that the system has reached a minimum in energy, a proposed swap of each spin system with every other spin system as well as every crosspeak with every other possible crosspeak is then attempted with only decreasing energy changes being accepted. This post-annealing minimization routine is repeated 100 times. The entire procedure, starting from initialization and ending with minimization, is repeated 20 times. The algorithm then chooses the spin system that was assigned to each residue location in a majority of the annealing runs (if any) and builds a consensus assignment set. The consensus assignment set is further curated using criteria defined below to produce the final assignment set. The overall BARASA algorithm is outlined in Figs.  1 and 2 .

figure 1

a The search engine rests on a Bayesian-based simulated annealing protocol that uses a specific-heat mechanism to guide cooling. Crosspeaks lists drawn from triple resonance spectra are assembled into putative spin systems, which are then randomly assigned to positions within the primary sequence of the protein. Sequential adjacency in the primary sequence is provided by apparent connectivities derived from triple resonance NMR spectra. Predicted chemical shifts, based on a high-resolution structural model or gleaned from empirical amino acid-specific distributions, are incorporated into the system energy using Bayesian statistics. Throughout annealing, crosspeaks may move among spin systems with overlapping resonances, changing the energies of the affected spin systems. Annealing involves Monte Carlo swapping of both crosspeak assignments to spin systems and spin system assignments to locations in the sequence. The concept of dynamic swapping of individual crosspeaks or entire spin systems is outlined in Fig.  2 . Annealing continues until energy equilibration is achieved. The temperature is then lowered and the system re-equilibrated. Annealing is stopped when the termination criteria are met and a local minimization routine is performed. b The final resonance assignments are developed from results of multiple independent simulated annealing runs. c Shown is a ribbon representation of maltose binding protein (PDB code: 1DMB [ https://doi.org/10.2210/pdb1DMB/pdb ]) color-coded according to assignment status following analysis by BARASA: correctly assigned residues (blue); unassigned residues (white), prolines (red). See main text for further details.

figure 2

a Spin systems (orange puzzle pieces) begin in the cache (black box) and are initialized by random assignment to the sequence (purple pieces). Spin systems can then be swapped with others or moved to different locations of the sequence or to the cache. Spin systems or cross peaks in their respective caches have no priority and are randomly selected. Swaps are accepted or rejected with a probability based on the change in energy of the proposed swap. b The energy of each spin system depends on how it fits with the adjacent spin system (adjacency energy) and with the predicted shifts for that residue location (chemical shift energy). Exchange of crosspeaks between spin systems can be thought of as changing the puzzle piece shape. See main text and supplementary material for details.

BARASA is accurate, robust, and fast

We tested BARASA against a test set of six different folded protein systems ranging in size and topology: human interleukin-1 receptor antagonist C66A, C122A (IL-1Ra, 152 residues, 17.1 kDa), human interleukin-1 β (IL-1 β , 154 residues, 17.5 kDa), S. solfataricus indole-3-glycerol phosphate synthase R43S (IGPS, 248 residues, 28.4 kDa), E. coli maltose binding protein (MBP, 371 residues, 40.8 kDa), the first cyclization domain from the Y. pestis yersiniabactin non-ribosomal peptide synthetase (Cy1, 453 residues, 51.9 kDa), and E. coli thymidylate synthase (ecTS, 264 residues, 61.0 kDa homodimer). In addition, we challenged the algorithm with two so-called intrinsically disordered proteins (IDPs). These include the V5 domain (residues 606-672) of human protein kinase C (V5dm, 68 residues, 7.7 kDa) and the intrinsically disordered region of human ANP32A (hIDD, 110 residues, 12.8 kDa). All crosspeak lists were derived from triple resonance data (Table  1 ). Crosspeak positions used were pulled from the canonical triple resonance spectra used for protein assignment (i.e., HSQC, HNCO 29 , HN(CA)CO 30 , HNCA 31 , HN(CO)CA 31 ,HNCACB 32 , HN(CO)CACB/CBCA(CO)NH 33 ) (see Supplementary Table  S1 ) with the exception of hIDD in which crosspeaks were derived from provided spin systems. To generate crosspeaks from the spin systems of hIDD, Gaussian error was added to the resonance values to create the chemical shifts of simulated crosspeaks. (see Methods). Four of the data sets (IL-1Ra, IL-1 β , IGPS, and MBP) were obtained in our laboratory. Crosspeak lists for Cy1, ecTS, V5dm, and spin systems for hIDD were kindly provided by Drs. Dominque Frueh (Johns Hopkins University), Andrew Lee (University of North Carolina at Chapel Hill), Tatyana Igumenova (Texas A&M University) and Martin Blackledge (Institut de Biologie Structurale), respectively.

The results from BARASA were compared to reference assignments to assess program performance. Reference assignments were obtained from either the BMRB, directly from another lab, or manually determined by us (Table  1 ). Deposited assignments were manually mapped to the acquired spectra for comparison. A small movement in crosspeak positions between the deposited assignments and the acquired spectra was permitted to account for differences in experimental conditions. In addition, a small number of resonances assigned in the deposited data sets were not present in the acquired spectra of IL-1 β . These were removed from the reference assignments and considered unassigned when assessing algorithm performance (Supplementary Table  3 ). For the most part, reference assignments were considered complete though in a few cases BARASA identified a small number of additional assignments that were confirmed manually and included in the reference assignments (Supplementary Tables  6 – 9 ). For each residue position, BARASA either outputs the spin system and its associated resonances that were assigned to that residue position or marks it as unassigned. The assignment given to each residue in the protein sequence by BARASA was determined to either be matching, missing, or mismatching its counterpart in the reference assignments. A residue was considered to have a matching assignment if the amide group assigned to it by the algorithm was the same as the reference. A residue was also considered to match the reference if it was unassigned both by BARASA and in the reference assignments. A residue was designated missing if an amide group was assigned to that location in the reference assignments, but BARASA did not assign that residue position. Lastly, a residue was labeled as mismatching if BARASA assigned an amide group and it did not match that in the reference assignments or if the residue was unassigned in the reference assignments.

In general, BARASA’s performance when utilizing structure-based chemical shifts and crosspeak lists derived from a comprehensive set of triple resonance experiments is marked by (nearly) complete assignments when compared to the manually curated reference assignments and, most importantly, produced very few errors (Fig.  3 & Supplementary Table  2 ). Individual statistics for each assignment are listed in Supplementary Tables  3 – 10 . BARASA had relatively more difficulty with the Cy1 and IGPS examples. This is likely due to a higher degree of variance in resonance chemical shifts of the backbone spins among the different spectra relative to the test cases because of the employment of multiple independently prepared samples, but the performance overall remained very good (Fig.  3 ). In the case of hIDD, a relatively high apparent mismatch rate is observed. Upon closer examination, the mismatching assignments made by BARASA were all assignments not previously reported as assigned. Many of these previously unreported assignments fall within regions of the sequence with low complexity (Supplementary Table  10 ) which is likely why they were difficult to assign manually. While there are no independent data supporting their veracity, these assignments proposed by BARASA and, as we discuss more below, by the next best performing automated assignment algorithm FLYA 34 are highly similar and are likely to be largely correct.

figure 3

Comparison of automated assignment algorithms. Results of automated resonance assignments by BARASA utilizing raw crosspeak lists drawn from a relatively comprehensive set of triple resonance experiments. Compared to manually curated resonance assignments obtained for eight test proteins: interleukin-1 β (IL-1 β ), interleukin-1 receptor antagonist (C66A, C122A) (IL-1Ra); indole-3-glycerol phosphate synthase (R43S) (IGPS), maltose binding protein (MBP), non-ribosomal peptide synthetase (Cy1), thymidylate synthase (ecTS), V5 domain of protein kinase C (V5dm), and intrinsically disordered region of human ANP32A (hIDD). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta), or missing (i.e., unassigned) (blue) to the reference assignments. *In the case of hIDD, a number of de novo assignments were indicated by BARASA and are included as mismatching with the reference assignments. See main text and Table  1 . Source data are provided in the Source Data file.

BARASA utilized SHIFTX+ 35 predicted chemical shifts for the globular test proteins, whereas the algorithm utilized random coil chemical shifts 36 , 37 for the so-called IDP examples as predicted shift restraints during annealing (see Methods). SHIFTX+ was chosen as it appears to be among the best-reported chemical shift prediction algorithm based solely on three-dimensional structural information and other physical parameters (i.e., temperature, pH). The related algorithm SHIFTX2, though it gives more accurate predictions, relies on the analysis of shifts from homologous proteins as well as the three-dimensional structural inputs specific to the protein being analyzed. It was our concern that the accuracy of SHIFTX2 would vary with the number of homologs available and, under circumstances of sparse homologs, result in significantly larger errors than are reported for the average case. As accurate estimation of prediction error is crucial to the Bayesian analysis (Methods), inaccurate and/or unaccounted for variance in prediction errors could compromise performance. Furthermore, as SHIFTX2 performs searches for the known chemical shifts of homologous sequences as part of its prediction, it would utilize the previously assigned shifts of our test proteins to present the BMRB in the generation of the predicted shifts. Such shifts would not be generally available for the de novo assignment of a protein and would thus be an invalid test of BARASA. We also note that using predicted chemical shifts generated by SPARTA+ 38 gave similar results (Supplementary Table  11 ) as when using those predicted by SHIFTX+.

In this regard, it is important to appreciate that it is statistically anticipated from the distributions of chemical shifts, either predicted or documented in the BMRB, that values outside the error range will be encountered. For example, if the distribution were taken as Gaussian and employing the standard deviation as the prediction error (see Methods), approximately 32% of all predictions would be expected to be outside of the considered error range. This is what is observed. Supplementary Tables  3 – 10 contain the likelihoods of the spin systems for the various test proteins. These likelihoods represent the probability of observing the experimental shifts given that the assignment is correct and ranges from 0 to 1. Likelihoods lower than 0.32 correspond to spin systems with predicted resonance chemical shifts that are, on average, beyond the specified error range but are nevertheless well accommodated by BARASA.

Finally, BARASA also produces a curated set of assignments from 20 annealing runs within 1 hour for each system tested (see Supplementary Table  12 ). With high accuracy and runtimes under an hour, the advantages of BARASA become even more apparent when considering large proteins with suboptimal data sets.

The performance of BARASA with suboptimal data sets

The rather complete crosspeak lists from an extensive set of triple resonance experiments for each test protein provide valuable benchmarks for the validation of BARASA, but are arguably not fully illustrative of the difficult protein systems often challenging current applications of protein NMR spectroscopy. To examine the performance of BARASA in cases of missing data and to illuminate the most impactful triple resonance information, individual crosspeaks or all crosspeaks of entire spin systems from the MBP and ecTS data sets were randomly discarded to generate compromised data sets, emulating data collection on challenging protein systems. Individual crosspeaks were randomly retained in the data set with a probability based on the crosspeak type (i.e., Cα, Cβ, or CO resonance). This process was done over a wide range of retention probabilities to produce a multitude of distinct data sets that represent a wide range of data completeness. These depleted peaks lists were then used as input to BARASA the results of which are provided in Supplementary Tables  13 and 14 . In this way, the relative importance and completeness of different types of spectral data as well as the effects of entirely missing spin systems could be probed. In addition, a key question was to learn the extent to which structure-based chemical shifts, as opposed to general BMRB residue-specific statistics, can rescue the assignment and aid the assignment process.

Figure  4 illustrates the robustness of BARASA when analyzing conditions of missing spectral data. This specific example was generated using retention probabilities of 88% and 25% for the Cα- and Cβ-based information, respectively, and with retention probabilities of either 0% or 75% for the CO-based information. Reliance on the BMRB database for predicted shifts, as opposed to structure-based shifts, yielded poor performance. In brief, the use of structure-based SHIFTX+ 35 predictions entirely rescues the resonance assignment. These data indicate that the availability of the structure-based chemical shift predictions serves as a powerful restraint in protein assignment - large enough to potentially surpass the information provided by the CO experimental pair under many circumstances. This is likely due to the fact that spin system adjacency is established adequately with the Cα and Cβ spectral information and the remaining assignment ambiguity is due to residue type matching; CO resonances provide little residue type information and offer little help in this respect. We do not believe this observation to be an artifact of the parameterization of the energy function since carbonyl-derived connectivities are weighted roughly the same as the chemical shift probability (Methods). As such, the energy provided by CO connectivity information would be of a similar magnitude of the total chemical shift energy of the spin system.

figure 4

Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Panels a – d correspond to results from input data sets where entire spin systems were discarded from the crosspeak lists. The ordinate axis is the fraction of retained spin systems and the dashed lines indicate the maximum fraction of possible matching assignments. The effects of random spin system depletion on the analysis of MBP both randomly ( a ) and as stretches of five consecutive residues being discarded ( b ). A similar analysis of ecTS with either individual ( c ) or groups of five consecutive spin systems being discarded ( d ). For the conditions 0.8 and 0.6 fractions retained, ten random data sets retaining the indicated fraction of spin systems were generated. The performance of BARASA on each data set is shown as a single orange solid circle, with the bar height representing the arithmetic mean. The full data set (“1.0” condition) results were taken from Fig.  3 . Only one result with the full data set was measured to avoid the comparison of run-to-run variation with variation due to differences in the input data set. The effects of restricting connectivity information by utilizing only a single pair of triple resonance experiments with either residue-type statistics (BMRB) ( e ) or structure-based (SHIFTX + ) ( f ) chemical shift predictions for MBP. Similarly, for ecTS using only residue-type statistics (BMRB) ( g ) or structure-based (SHIFTX + ) ( h ) chemical shift predictions. The effect of random depletion of crosspeaks from the comprehensive set of triple resonance experiments where the indicated percentages each type of crosspeak that are retained is illustrated for the MBP ( i ) and ecTS ( j ) data sets and used with residue-type statistics (BMRB) or structure-based (SHIFTX+) predicted chemical shifts. Results of ten individual runs ( n  = 10) are plotted as solid orange circles and bar heights represent the arithmetic mean. Source data are provided as a Source Data file.

Randomly retained spin system data sets were generated in two ways: by allowing all crosspeaks of any spin system assigned in the reference assignments to be randomly discarded from the input data set until only the indicated fraction of the assigned spin systems remained or by discarding the crosspeaks of random spin systems in the same manner, with the added condition that only those from sets of five random, but contiguous in sequence, spin systems are discarded. The latter condition was performed to simulate the performance of BARASA under the common situation where exchange broadening arising from physical motion of contiguous stretches of sequence (e.g., loops) results in loss of amide resonances. In both cases, BARASA is still able to produce the overwhelming majority of the possible assignments without errors even when up to 40% of the spin systems are missing (Fig.  4 ). There is little difference in performance whether the missing data is localized or distributed across the sequence. The performance of BARASA when challenged with artifact peaks, which often arises from low-concentration or unstable samples or instrumentation, was also examined. In this case, a depleted data set from above was augmented with randomly generated artifact peaks. Only a modest decrease in performance is observed even when the crosspeak list is contaminated with 20% artifactual entries (Supplementary Fig.  1 ).

Even with the considerable time-savings introduced by non-uniform sampling 39 , collection of NMR data on proteins is still time intensive. The superior performance of BARASA on missing data within a comprehensive set of triple resonance experiments raised the possibility that BARASA could tolerate a reduced set of triple resonance experiments. We tested this hypothesis using ecTS and MBP where information from a single triple resonance experimental pair (e.g., HNCA and HN(CO)CA) combined with BMRB or SHIFTX + predicted shifts were analyzed. The Cα- and Cβ-type triple resonance pairs are equally useful in the BARASA assignment process when provided SHIFTX+ shifts, but the Cβ information becomes relatively more effective when relying on BMRB amino acid distributions (Fig.  4 ). This is clearly due to the higher residue type information intrinsic to the Cβ resonance. Overall, BARASA performs extremely well with either the Cα -or Cβ-type triple resonance experimental pairs only. In contrast, the CO-type triple resonance experimental pair when used alone is much less effective, likely due to the reduced sensitivity of carbonyl carbon shifts to amino acid type and local structure.

Comparison to alternate automated resonance assignment algorithms

Computer-assisted resonance assignment strategies for analysis of triple resonance spectra have been employed for over two decades. For the sake of comparison, three highly-cited algorithms were compared to BARASA: FLYA, AutoAssign 22 , and I-PINE 40 . The same crosspeak lists derived from the comprehensive set of triple resonance experiments were used for all four algorithms (Fig.  5 ). BARASA achieved the highest percent matching among all the algorithms against the reference assignments in all test cases. BARASA outperformed AutoAssign and I-PINE by considerable margins, most notably with the two IDPs examined, while offering only marginal improvement over FLYA (Supplementary Table  2 ). Importantly, BARASA made few mismatching assignments (<3%) while I-PINE had up to 20% mismatches meaning that about 1 in 5 assignments made were incorrect. For these reasons, AutoAssign and I-PINE were not examined further.

figure 5

Performance of BARASA, FLYA, AutoAssign (AA), and I-PINE against reference triple resonance assignments of six protein systems: a IL-1 β ; b IL-1Ra; c IGPS; d MBP; e CY1; f ecTS; g V5dm; h hIDD. Shown are the fractions of residues that are accurately matched to the reference assignments (green), incorrectly matched (magenta) or missing (i.e., unassigned) (blue). *BARASA and, to a lesser extent, FLYA extended the reference assignments for hIDD considerably (Supplementary Table  10 ). The extended assignments are therefore denoted here as mismatching. Source data are provided as a Source Data file.

The marginal advantage of BARASA over FLYA when utilizing a comprehensive triple resonance data set prompted us to examine their behavior in the more challenging situations commonly encountered. BARASA’s performance in settings where there is a significant amount of missing data was compared against FLYA. MBP and ecTS crosspeak lists with varying retention probabilities were generated and used as input for BARASA and FLYA (Fig.  6 ). BARASA was able to generate a higher assignment match rate in all scenarios with the difference in performance between the algorithms growing as the data became increasingly sparse. In addition, the mismatch rate between the algorithms remained similar. These results demonstrate that BARASA has excellent outcomes in circumstances where there is a large quantity of missing data – greatly outperforming existing algorithms.

figure 6

The effects of random crosspeak depletion on the analysis of MBP ( a ) and ecTS ( b ) comprehensive triple resonance data sets with partial retention of the indicated crosspeak types (see text and Fig.  4 ). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Ten independent data sets ( n  = 10) were randomly generated for each depletion condition. The results of analysis by BARASA for each data set are shown as solid orange circles and the bar heights correspond to the mean. Source data are provided as a Source Data file.

Use of predicted versus experimentally determined structural models

It is clear from Fig.  4 that use of structure-based chemical shift predications provides significant advantages over simple residue-type predictions derived from empirical distributions. This is particularly true in the case of Cy1, which is perhaps an exemplar of the challenges facing modern protein NMR and required a battery of experimental spectra and labeling schemes 41 . The sheer number of samples and experiments required resulted in a relatively high variation in resonance positions among the spectra. The resonance assignment was carried out in the absence of an experimentally determined structural model with the closest homolog having only 38% identity. Accordingly, the resonance assignment of Cy1 must be considered a significant achievement.

The absence of an experimentally-determined atomic-resolution structure of the protein of interest is a common occurrence and can severely limit the resonance assignment process. However, powerful structure-prediction algorithms have recently been introduced 42 and we sought to learn how the availability of structures predicted by the AlphaFold2 algorithm influence the performance of BARASA. Chemical shifts predicted by SHIFTX+ using the structure of Cy1 predicted by AlphaFold2 were used for analysis by BARASA. Using only residue-type information based on the BMRB resulted in poor performance. However, when utilizing the predicted chemical shifts from the predicted structure of Cy1, BARASA recapitulated its performance based on the NMR-derived structure and a comprehensive set of triple resonance experiments. In addition, BARASA performed very well using subsets of triple resonance experiment pairs and significantly outperformed FLYA (Fig.  7 ). This level of success of BARASA using SHIFTX+ in concert with structures predicted by AlphaFold2 was observed across the test data sets (Supplementary Table  15 ). Taken together these data suggest that the lack of an experimental structure is unlikely to hinder the full capability of the BARASA algorithm.

figure 7

The resonance assignment by BARASA using the indicated cross crosspeak types from the triple resonance spectra and, residue-type (BMRB) chemical shift statistics ( a ) or chemical shifts predicted by SHIFTX + based on a structural model provided by AlphaFold2 ( b ). Triple resonance data sets include the peaks from the following spectra: HNCA/HN(CO)CA (Cα), HN(CA)CB/HN(COCA)CB (Cβ) and HNCO/HN(CA)CO (CO). Bar heights indicate the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Equivalent runs with FLYA ( c ) using the data set of ( b ) reinforce the conclusion that BARASA is more robust to non-ideal data. Source data are provided as a Source Data file.

In summary, we have demonstrated that Bayesian-based simulated annealing combining sequential relationships derived from triple resonance spectra and chemical shift information predicted from a high-resolution structural model can greatly facilitate the triple-resonance backbone assignment of proteins. The implementation of this strategy in BARASA is robust to incompleteness of spin system definition (sparseness) and overall complexity of the resonance assignment challenge (protein size). Importantly, BARASA is relatively conservative and makes few errors. An optimized annealing strategy utilizing a specific heat approach to guide temperature cooling results in a very rapid analysis. The speed of analysis combined with its aforementioned robustness clearly positions BARASA to inform on the real time data acquisition side of the resonance assignment process. This becomes increasingly feasible with the utilization of automated crosspeak picking. Iterative examination by BARASA of sequentially acquired triple resonance spectra could, in principle, allow the user to determine if a satisfactory level of assignment can be achieved without further data acquisition and thereby save valuable spectrometer time. In summary, the BARASA algorithm provides the ability to easily and robustly assign unusually difficult protein systems and simplify this otherwise challenging task. The combination of fast and robust backbone resonance assignments with structure-based methyl resonance assignments 43 , 44 , 45 , 46 , 47 , 48 will reduce the resonance assignment barrier considerably and allow greater application of the power of NMR spectroscopy to be applied in a facile manner to otherwise challenging proteins.

NMR sample production

A vector encoding the gene for Interleukin 1- β (IL-1β) was transformed into E. coli BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources, respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500xg and frozen overnight. The cell pellet was then thawed, resuspended in 10 mM potassium phosphate pH 8.0, 0.2 mM EDTA, 5 mM DTT and 1 mM PMSF. The cells were then lysed by sonication and centrifuged at 32,000xg for 30 min at 4 °C. Lysate was then brought to 80% saturation with NH 4 SO 4 and allowed to stir for 1 hr at 4 °C. The suspension was then centrifuged for 30 min at 32,000 x  g 4 °C and the pellet was resuspended in 25 mM ammonium acetate pH 4.5, 1 mM BME and dialyzed overnight (8 kDa MWCO) in the same buffer at 4 °C. The dialyzed protein was then loaded onto a HiTrap Capto S column (Cytiva Life Sciences) equilibrated in 25 mM ammonium acetate pH 4.5, 1 mM BME and eluted with a linear gradient up to 500 mM ammonium acetate pH 4.5, 1 mM BME. Protein was then frozen and lyophilized. The lyophilized protein was dissolved in 20 mM Tris pH 8.0, 7 M urea, 20 mM DTT and added drop wise to 20x volume of 20 mM tris, 100 mM NaCl, 5 mM DTT pH 8.0 under constant stirring. The refolded protein was then dialyzed against 50 mM sodium acetate pH 5.0, 5 mM DTT and concentrated to 0.67 mM. To this sample 0.02% NaN 3 , 100 μM DSS and 5% D 2 O was added. Triple resonance spectra were acquired at 23 °C on an 800 MHz ( 1 H) Bruker NEO spectrometer running TopSpin and equipped with a CryoProbe.

A vector encoding the gene human interleukin-1 receptor antagonist (IL-1Ra) containing C66A/C122A amino acid substitutions was expressed using E. coli BL21(DE3) cells in M9 minimal media. The M9 minimal media contained 15 NH 4 Cl and 13 C-glucose as the sole nitrogen and carbon sources respectively. The culture was centrifuged at 5000 rpm, and the cell pellet was resuspended in 20 mM Tris, 500 mM NaCl, 20 mM imidazole, pH 7.9 for sonication. Sonicated cells were centrifuged at 15000 rpm, and supernatant was loaded onto a His60 column (Takara Bio USA). The column was washed with 20 mM Tris, 500 mM NaCl, 40 mM imidazole, pH 7.9; and protein was eluted with 20 mM Tris, 500 mM NaCl, 500 mM imidazole, pH 7.9. The collected protein fraction was buffer exchanged to 12.5 mM HEPES, 50 mM NaCl, 5 mM CaCl2, pH 6.5 for His-tag removal by FXa protease (New England Biolabs) and further purified both by affinity (His60 resin) and size exclusion chromatography (S-75 Sephadex, Cytiva Life Sciences). The NMR sample was prepared by buffer exchanging the protein into 100 mM NaCl, 25 mM MES, pH 6.0 and concentrated to 1 mM, with the addition of 100 μM DSS, 5% D2O, and 0.02% NaN3 (Supplementary Table  1 ). Triple resonance assignment experiments were acquired at 35 °C on either a 500 MHz Bruker Avance spectrometer or an 800 MHZ ( 1 H) Bruker NEO spectrometer both equipped with a Cryoprobe.

A R43S variant of the gene for indole-3-glycerol phosphate synthase from S. solfataricus (IGPS) was cloned in a modified pGS-21a vector downstream of an N-terminal His-tag and TEV protease site. This expression plasmid was a gift from the lab of Professor Robert Matthews, University of Massachusetts Medical School, Worcester. IGPS R43S protein was expressed in BL21(DE3) competent cells with ampicillin antibiotic selection. Cells were grown at 37 °C until they reached an OD600nm of 0.6 and 1 mM IPTG was added to induce expression for 16-20 h at 25 °C. To isotopically label the protein for NMR spectroscopy, cells were grown in M9 minimal medium with 15 NH 4 Cl and 13 C-glucose as the nitrogen and carbon sources, respectively. Cells were lysed in 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 5 mM imidazole by sonication. The lysate was loaded onto a Ni 2+ -NTA column pre-equilibrated with the lysis buffer. Impurities weakly bound to the column were washed away with 100 mM potassium phosphate, pH 7.5, 150 mM KCl, 75 mM imidazole, followed by equilibration into the low salt buffer 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 75 mM imidazole. Protein was eluted with 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 500 mM imidazole and dialyzed into lysis buffer. Purified His-tagged protein was concentrated to 5 mL, and tag was cleaved with TEV protease added at 1:30 mass ratio and mixing at RT overnight. Untagged protein was separated TEV protease and uncleaved protein by Ni 2+ -affinity chromatography. Protein aliquots were flash frozen and stored at −80 °C. NMR samples of 15 N 13 C-labeled IGPS were prepared at 250 µM concentration in 60 mM potassium phosphate, pH 7.2, 50 mM KCl, 5% D 2 O, 100 μM DSS. All data were collected on a 750 MHz ( 1 H) Bruker AVANCE III NMR spectrometer equipped with a CryoProbe at 50 °C.

A vector encoding maltose binding protein (MBP) was transformed into BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500 × g. The cell pellet was frozen overnight and resuspended in 20 mM Tris-HCl, 20 mM NaCl pH 8.0, 1 mM DTT. 6 mg of Lysozyme was added and was incubated under stirring for 30 min at room temperature. Cells were further lysed by sonication and centrifuged at 32000 × g for 30 min at 4 °C. Clarified lysate was filtered (0.45 um pore size) and loaded onto a 25 ml DEAE column equilibrated in 20 mM Tris, 20 mM NaCl, pH 8.0, 1 mM DTT. The protein was eluted using a gradient to 20 mM Tris, 500 mM NaCl. Protein was concentrated to 1-2 ml and run on a 112 ml Superdex 75 column equilibrated in 20 mM Tris, 20 mM NaCl, 2 mM DTT pH 8.0. The protein was pooled and unfolded by dialysis in 4 M GuCHl, 20 mM Tris-HCl, 1 mM DTT pH 7.5. Protein was refolded by repeated 10x dilution with 20 mM sodium phosphate pH 7.1, 1 mM EDTA, 2 mM β-cyclodextrin, 0.02% NaN3, 100 μM DSS 5%D 2 O followed by concentration (4 times). From this a 0.5 mM sample of MBP was created. Spectra were acquired at 37 °C on an 800 MHz ( 1 H) Bruker NEO NMR spectrometer. NMR data acquisition and processing parameters recorded by us for IL-1β, IL-1Ra, IGPS and MBP are summarized in Supplementary Table  1 . Poisson gap NUS spectra were reconstructed using hmsIST 39 and all spectra were processed with NMRpipe 49 on NMRBox 50 . Spin systems were built by manual peaking picking using NMRFAM-SPARKY 51 and referenced using DSS.

Origin of protein test data sets

Triple resonance data acquired in our laboratory were processed using the NMRPipe 49 installed on NMRbox 50 . The crosspeak lists were constructed from data acquired in our laboratory (see Table  1 ) by manually crosspeak picking using NMRFAM-SPARKY 51 (i.e., not reconstructed from deposited assignments) (see Table  1 ). Crosspeak lists for ecTS, Cy1 and V5dm were provided by Professors Andrew Lee (University of North Carolina, Chapel Hill), Dominique Frueh (Johns Hopkins University), and Tatyana Igumenova (Texas A&M University), respectively, and were used without further adjustment. Crosspeaks for hIDD were generated from spin systems provided by Professor Martin Blackledge (Institut de Biologie Structurale) in the following manner. Each provided spin system consisted of an amide proton (H) and amide nitrogen (N) chemical shift as well as chemical shifts for Cα, Cα(i-1), CO and CO(i-1) resonances (though a complete set of carbon resonances were not present for each spin system). HNCA, HN(CO)CA, HNCO, and HN(CA)CO crosspeak lists were generated from the spin system data by adding the following crosspeaks to the indicated crosspeak list from each spin system: H-N-Cα(i-1), H-N-Cα for the HNCA; H-N-CO and H-N-CO(i-1) for the HN(CA)CO; H-N-CO for the HNCO and H-N-Cα(i-1) for the HN(CO)CA. The resonance values for the crosspeak positions were drawn from a normal distribution with a mean given by the value of the resonance in the spin system and a standard deviation of 0.003, 0.04, and 0.04 ppm for hydrogen, nitrogen and carbon resonances, respectively.

BARASA algorithm description

The algorithm begins by reading in the crosspeak lists to assemble spin systems. Within the crosspeak lists, the user provides the possible crosspeak types that are produced by the experiment. For example, the HNCA would produce possible crosspeak types of H-N-CA(i) and H-N-CA(i-1). The user also specifies cutoff values for each spectral dimension that dictate the range over which chemical shifts will be matched during spin system construction. The provided crosspeak types dictate which dimensions have resonances of ambiguous type. In the example of the HNCA, the first two dimensions are of unambiguous type (H and N resonances respectively). However, the third dimension is ambiguous (CA(i) or CA(i-1)).

BARASA builds crosspeak lists by first arbitrarily choosing a crosspeak to seed the construction of the spin system. All other crosspeaks are searched to find those that have at least two resonances of unambiguous type that match the resonances of the seed crosspeak, both in terms of their chemical shift (i.e., fall within a tolerance cutoff specified by the user) and resonance type. After each subsequent addition, BARASA attempts to resolve ambiguous resonance types based on known chemical shifts already present in the spin system. For example, if a spin system has a Cα(i-1) value of 56.0 ppm (with a tolerance of 0.3 ppm) and a HNCA crosspeak, which is added (which could have a resonance type of Cα or Cα(i-1)) with a value of 58.0 ppm, then the algorithm will resolve the type of the new crosspeak as the Cα as it is not within the 0.3 ppm tolerance of the 56.0 ppm Cα(i-1). After adding the crosspeak and resolving type, the algorithm then iterates through the entire list of remaining crosspeaks and repeats the above addition procedure. Once no more peaks can be added to the spin system, a new crosspeak is arbitrarily chosen from the list of remaining peaks to seed the construction of additional spin systems. This continues until all peaks have been added to a spin system.

If BARASA finds a crosspeak that has two unambiguous resonances that match those already present in a spin system, but contains additional resonances that have shifts which conflict with those that are already present in the spin system, then an additional spin system in which to place the incongruent crosspeak is created. Such as situation arises due to spectral degeneracy (e.g., two spin systems with the same or similar amide shifts). The algorithm will then attempt to add the remaining peaks to both spin systems. Any further clashes are resolved by the generation of a new spin system. This continues until no more crosspeaks can be added to any spin system. The crosspeaks within this group of spin systems are then marked by the algorithm to be allowed to exchange to any other spin system within the group during the annealing process. In addition, the user has the option to allow the algorithm to use a crosspeak cache to which low intensity peaks (lowest 5%) can be added to over the course of the annealing run to provide a mechanism to eliminate potential artifactual crosspeaks.

Once all the crosspeaks to a spin system have been added, all possible resonance type sets are generated for that spin system. A resonance type set is a complete designation of each atom type of each crosspeak in a spin system. If a spin system only contains peaks with no ambiguous resonance types, then the spin system has only one possible resonance type set. This is the case for the majority of data sets as experiments with ambiguous resonance types are often paired with experiments that resolve this ambiguity (e.g., HNCA, HN(CO)CA experimental pair). However, if ambiguous resonance types are present in a spin system, then the spin system will contain all possible resonance type sets. A distinct set of average resonance values for the spin system are calculated for each resonance type set; all of which will be considered over the course of the annealing run.

The resonance assignment analysis is then initialized by randomly assigning the spin systems to the protein sequence. Often there are more spin systems than are residue positions (e.g., spin systems correlated to a side-chain amide group and not the backbone are also present in the data set). Any spin systems that were not randomly placed on the sequence, are placed in a spin system cache and may be assigned to the sequence over the course of the run. The simulation temperature is initialized at 1000 arbitrary units and a spin system or crosspeak is chosen at random to swap. The probability that a swap will be a crosspeak swap is set at 0.01 (which was found to be a good compromise between sampling and algorithm speed) with the remaining swaps being spin system swaps. A chosen spin system will have the ability to be added to the spin system cache, swap positions with another spin system, or move to an empty position in the sequence, making its former position available. Whenever a spin system is moved, a random resonance type set is chosen from among those possible. In addition, the algorithm may attempt to change the current resonance type set and keep the current spin system in place. If a crosspeak is chosen to swap, it has the potential to be added to the crosspeak cache (if it is of low intensity), added to another spin system within its spin system group, or swap places with any crosspeak within its spin system group. Upon moving/swapping cross peaks, the affected spin systems are evaluated for clashes. If there are none, the crosspeak swap is allowed to continue, otherwise the swap is rejected. In addition, a crosspeak move/swap will trigger the affected spin systems to generate all new resonance type sets and choose one at random from the possibilities. This forces a recalculation of average chemical shifts for each resonance type set of each spin system resonance, as well as the Bayesian probabilities described below for sequence position determination.

If the swap is not immediately rejected due to a crosspeak clash, the change in energy of the system due to the swap is calculated using the energy function described below. The swap is then accepted or rejected at a frequency corresponding to a probability generated by applying the Metropolis criterion (Eq.  1 ). After each successful swap, the energy of the state is recorded and stored as a part of a sample of energy values. Once the sample reaches a particular size, the sample mean and standard error are calculated and an additional sample is generated by continued swapping. A Student’s two tailed t-test is performed to compare the sample means of the two samples. The system is considered to have equilibrated at the current temperature if the p -value of the t-test is greater than a user supplied value (default p  > 0.5). If equilibration has not been reached, more swaps are performed to generate an additional sample and the t-test is repeated with the two most recent samples. If equilibration has been reached, then the energy values are used to estimate the specific heat at the current temperature:

Where \(T\) is the current temperature in arbitrary units, \(E\) is the energy of the system and the angled brackets indicate the sample mean. Large drops in average ensemble energy due to oversized temperature steps can lead to the system becoming trapped in a local minimum. By deciding on a target energy drop that is unlikely to lead to a frustrated state, we can utilize the specific heat calculated at each temperature to estimate the temperature drop needed to achieve the target change in energy. This is done in the following manner:

Where \(\triangle {\left\langle E\right\rangle }_{{target}}\) is a user-controlled parameter and is kept at −2000 for this study. Decreasing the magnitude of the target energy drop, in situations where the system is becoming trapped in a frustrated state can lead to better results at the expense of longer simulation time. If \({T-T}_{{next}}\) is greater than 10, then the temperature decrease is limited to 10 units to prevent overcooling the system. The use of the specific heat in this manner results in smaller temperature steps at temperatures where the system is rapidly decreasing in energy, while allowing for larger steps when drops in temperature have a modest effect on the ensemble. The resulting schedule avoids system quenching while simultaneously minimizing unproductive swaps at temperatures that are either too high or too low for effective annealing. After decreasing the temperature, the annealing run will terminate if any of the following criteria are met: the temperature is less than 1, the product of the temperature and the last specific heat calculated is less than 200, or the ratio of unsuccessful swaps to successful swaps while collecting the last sample is greater than 10,000. The rational for the criteria are as follows: Given the standard energy parameterization, productive annealing is unlikely to happen at temperatures below 1; the product of specific heat and current temperature (at low temperatures) provides a crude estimate as to the amount of energy between the current ensemble and global minimum (i.e. the thermodynamic ensemble at T = 0, which should correspond to a single state) and approximately 200 energy units is negligible; and at this ratio of unsuccessful to successful swaps, the system is near a minimum and further sampling is inefficient. If termination is not achieved, a new sample size is defined using the following equation:

Where N is the number of residues in the sequence. This equation permits increases in sample size when sampling at temperatures with high specific heats, which is where the most productive swaps occur. This approach also permits scaling of sampling for larger proteins. The parameters of this equation were found empirically to be a good compromise between sufficient sampling and speed. Samples are then drawn at the new temperature to determine equilibration and the cycle is continued. Upon termination of the annealing protocol, a steepest-descent type search is performed to locally minimize the system energy and refine the assignments, discarding potentially bad assignments that were left over from the run. This is done by attempting to place (or swap) every spin system/peak at every possible location in the sequence/spin system group (including the cache, if allowed). Only spin system/peak swaps/placements that decrease the system energy are accepted. This is repeated 100 times.

This entire process of simulated annealing is independently repeated with a number of different random starting conditions. Here we have used 20. A consensus set of assignments is generated by calculating the frequency with which each spin system is placed at each amino acid location. The spin system assigned to each residue location in a majority of the runs (if any) is kept as the consensus spin system. A curated set of assignments is generated from this consensus analysis. The curation procedure is as follows: the consensus spin system at each residue was chosen as the tentative assignment for that particular residue. Residues without a consensus spin system (i.e. did not have the same spin system assigned to it greater than 50% of the time) were marked as unassigned. Tentatively assigned spin systems are then evaluated by the posterior probabilities as well as the number of connectivities defined as a matching resonance between adjacent spin systems. Assignments were accepted if they met any of the following criteria: 1) the assigned spin system has at least two connectivities with adjacent spin systems, 2) the assigned spin system has at least 1 connectivity with adjacent spin systems and a posterior probability at least three times higher than the quantity 1/ N , or 3) the assigned spin system has a posterior probability > 50%. Residues with tentative assignments that did not satisfy any of these criteria were then marked as unassigned.

The energy function used in the annealing routine is calculated as the sum of all the energies of the constituent spin systems ( E tot ) (Eq.  5 ). At any given step during the annealing protocol, spin systems are either tentatively assigned to a position in the sequence or placed in the cache. Cached spin systems are defined as having zero energy (i.e., \({E}_{m}=0\) ).

The energy of each spin system tentatively assigned to a specific place in the amino acid sequence is comprised of the adjacency energy ( \({E}_{m}^{{adj}}\) ) and the chemical shift energy ( \({E}_{m}^{{cs}}\) ):

The adjacency energy is related to the degree of correspondence between the averages of the Cα, Cβ and CO resonances of the current spin system and the averages of the Cα (i-1), Cβ (i-1) and CO (i-1) resonances of the spin system tentatively assigned to the subsequent position in the sequence. Each average resonance value in a spin system is calculated as the arithmetic mean of all resonance chemical shifts of the indicated type from all of the crosspeaks that contain that resonance currently in the spin system. \({E}_{m}^{{adj}}\) therefore, captures the process of evaluating spin system adjacency and is based on the number of potential connectivities between adjacent spin systems tentatively assigned to the sequence. For example, if spin system m is assigned to a residue position immediately prior to that of spin system l , then the adjacency energy is given by:

Where \({\delta }_{k\left(i\right)}^{m}\) is the chemical shift of resonance k ( i ) (either Cα( i ), Cβ( i ) or CO( i )) of spin system m and \({\delta }_{k\left(i-1\right)}^{l}\) is the chemical shift of resonance k ( i −1) (either Cα ( i −1), Cβ ( i −1) or CO( i −1)) of spin system l ). \({\sigma }_{k}\) is related to the estimated precision of the measured chemical shifts. The E adj is the sum of inverted Gaussians when c 0  < 0. Previous assignment algorithms have used functions of this form to good effect for estimating adjacency 26 . In the limit of well-matched connectivities, the sum of inverted Gaussian functions will have a minimum value of K(c 0  +  c 1 ) where K is the number of connectivities whereas, for poorly matched putative connectivities, the adjacent energy will tend to a limit of Kc 1 . Importantly, when an expected element of spin system m or l is missing, that contribution to the adjacency energy is set to zero. Similarly, if the subsequent position in the sequence is not currently assigned a spin system, then E adj  = 0. Here, c 0 and c 1 were set to −100 and +50, respectively. This results in an energy of −50 if the difference in chemical shifts is 0 and approaches +50 as the magnitude of the difference in chemical shifts approaches infinity. The value \({\sigma }_{k}\) is influenced by the properties of the NMR spectra from which the spin systems are built. For all runs described, \({\sigma }_{k}\) was chosen so that the function has an abscissa-intercept at a chemical shift difference of 0.2 ppm for all nuclei k .

The second term of the spin system energy, \({E}_{m}^{cs}\) , evaluates the degree of correspondence of the observed chemical shifts to those predicted. It is this term that makes use of the ability of Bayesian statistics to incorporate diverse degrees of knowledge of the local structure of the protein. These include relatively structureless information encoded in the simple empirical distributions of chemical shifts of the amino acids observed in proteins or specific chemical shift predictions based on the high-resolution structure of the protein being examined. For the former, we utilize the BMRB 52 database. For the latter, we use SHIFTX + predictions derived from either crystallographic structures available in the PDB 53 or structures predicted by AlphaFold2 42 . Or in the case of the IDPs V5dm and hIDD, we use calculated, sequence-specific random coil chemical shifts 36 , 37 as prediction. \({E}_{m}^{{cs}}\) is ultimately calculated from the Bayesian posterior probability of a proposed assignment given the observed chemical shifts:

The subscripts n and m index over all residue positions and the provided spin systems, respectively. The condition A n,m refers to where spin system m is correctly assigned to sequence position n . The condition B m refers to the observed chemical shifts of spin system m . Condition \({Q}_{{m}_{i}}\) refers to where resonance type set i of spin system m is the correct resonance type set. Because it is possible for the spin system to have ambiguous resonance crosspeak types, the probability calculation explicitly considers each resonance type set of a spin system within the context of each residue location. Thus, an assignment entails both the placement of a spin system at a residue location and choice of resonance type set.

The prior probability \(P\left({A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) refers to the initial probability of the assignment of spin system m to residue n being correct and that the resonance type set i is correct for spin system m . If I m represents the number of possible resonance type sets of spin system m then the total number of combinations of residue type sets and residue locations for spin system m is the product I m N . However, given the constraints provided by the amino acid sequence of the protein, not all combinations of sequence location and residue type sets are possible. For example, a resonance type set with a defined amide proton would be impossible to place at a proline. To encode the impossibility of certain resonance type set/residue location combinations, these assignments are assigned a prior probability of 0. The remaining prior probability is then evenly distributed among the remaining locations:

Where C is the number of possible combinations of n and \({m}_{i}\) in the sequence.

The likelihood of assignment \(P\left({B}_{m}|{A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) (i.e., the probability of observing the chemical shifts of spin system m given the assignment \({A}_{n,m}\cap {Q}_{{m}_{i}}\) ) is given by Eq.  10 & 11 :

Where \({\delta }_{{pred},r}^{n}\) is the predicted chemical shift of spin r at sequence position n ; \({\delta }_{{obs},r}^{{m}_{i}}\) is the observed chemical shift of resonance r of resonance type set i of spin system m and \({\sigma }_{r}^{n}\,\) is the standard error for the chemical shift prediction of resonance r at sequence position n . The resonances, represented by variable r , are the following: H, N, C α , C β , CO, C(i-1), C β (i-1), CO(i-1). In Eqs.  10 and 11 it is assumed that the random variable \({\delta }_{{pred},r}^{n}\,\) is normally distributed about \({\delta }_{{obs},r}^{m}\) with a standard deviation \({\sigma }_{r}^{n}\) and that the error in the chemical shift measurement is much less than the error in the prediction. With these assumptions, the random variable \({X}_{n,m}^{2}\) is a chi-square distribution with R degrees of freedom, where R is equal to the number of spins for which data are provided. The likelihood is then calculated as the value of the complementary cumulative distribution function (CCDF) of a chi square variable of R degrees of freedom at \({X}_{n,{m}_{i}}^{2}\) .

The likelihoods of all other residue position/resonance type sets being a valid assignment of spin system m are considered via the calculation of the marginalization, \({{{{{\rm{P}}}}}}\left({B}_{m}\right)\) :

Where the summation terms are over all possible i and n combinations. Using Bayes’ theorem as expressed above, the posterior probability (i.e., the probability of a particular assignment being correct given the observed data) can be calculated and then \({E}_{m}^{cs}\) determined via:

To avoid numerical instability in the evaluation of logarithms of numbers near zero and to prevent a dominating influence of inaccurate chemical shift predictions on the energy function, instances where \({E}_{m}^{{cs}}\)  >  \({E}_{{{{{\rm{max}}}}}}^{{cs}}\) are fixed at  \({E}_{\max }^{{cs}}\) . \({E}_{\max }^{{cs}}\) and \({E}_{\min }^{{cs}}\) are set to 100 and −50 respectively, for this study.

The values of the parameters for the energy function were chosen to safeguard against inaccurate chemical shift predictions based on the following reasoning: a perfectly matching connectivity between two resonances will contribute −50 to the final energy function. Given that a spin system with a posterior probability of 0 will contribute 100 to the final energy function, it would require two perfect connectivities or three or more reasonable connectivities for that spin system to be favorably assigned to that position vs being left in the cache. This was done to permit the algorithm to assign a spin system to a particular location in the event of highly inaccurate chemical shift prediction of its resonances so long as there are sufficient resonance connectivities to justify the assignment. Likewise, the \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) parameter was chosen such that a posterior probability of 1.0 would result in an energy contribution of −50 and would be equal to the contribution of a single perfect connectivity. This would require two bad connectivities to overrule a high posterior probability and disfavor its assignment. The user has control over these \({E}_{{{{{\rm{max }}}}}}^{{cs}}\) and \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) to adjust the relative influence of chemical shift energy on the course of the annealing run.

The source of predicted shifts for each resonance can be from any source, so long as the precision of the prediction algorithm is accurately estimated. For IDPs, sequence-specific random coil chemical shifts can be substituted (see below). In the absence of an acceptable structural model, the average and standard deviation of the BMRB distribution of chemical shifts for a given atom of a given residue type are used as the predicted shift and prediction error respectively. This is also used in regions where the sequence of interest contains a tag that is absent in the structural model used to predict chemical shifts as well as regions that are not resolved.

Generation of predicted chemical shifts

Predicted H, N, C α , C β , and CO chemical shifts were generated via SHIFTX + using PDB entries and/or AlphaFold2 predicted structures (Table  1 ). Chemical shift prediction errors for H, N, C α , Cβ, and CO were taken from the reported root mean squared deviations (RMSD) of SHIFTX + predictions: 0.45, 2.4, 0.8, 0.95, and 0.9 ppm, respectively. Sequence regions present in the NMR sample but not resolved or present in the provided structure (e.g., loops or expression tags) were given predicted values from their corresponding average values in the BMRB. For the runs that were performed with SPARTA+ 38 predicted shifts, the reported errors for each individual prediction were used. For the IDPs V5dm and hIDD, predicted shifts were provided using predicted sequence-specific random coil chemical shifts according to the method in 36 , 37 Prediction errors were taken from the reported RMSD of the prediction method and were 0.16, 1.0, 0.42, 0.37, and 0.43 ppm for H, N, C α , Cβ, and CO resonances, respectively. Prediction errors associated with BMRB-derived values were taken as the standard deviation of the corresponding resonance distribution for the particular amino acid type in the BMRB.

Comparison of resonance assignment algorithms

BARASA was compared to three triple resonance assignment algorithms that are highly utilized by the NMR community. All algorithms were provided the same crosspeak lists as BARASA, albeit in different file formats. As FLYA can utilize predicted chemical shift data, the algorithm was provided with the same predicted shifts and associated errors as BARASA. Assignment results were taken from the strong assignments generated from 20 runs. The assignment algorithm I-PINE was run using the I-PINE server. AutoAssign was run on NMRbox using the default parameters. For each algorithm, proposed assignments were compared to reference assignments. At each residue position, the proposed assignment was determined to have either matched, mismatched, or been missing when compared to the reference assignments (see Results and Discussion). The same reference assignments were used for the evaluation of all algorithms.

Generation of simulated data sets

To assess the performance of BARASA on datasets of lower quality, the MBP crosspeak lists were processed to randomly retain spin systems and/or individual crosspeaks at specific probabilities depending on cross peak type. For each data quality condition 10 different independent data sets were randomly generated and BARASA was run on each of them. The results from each of these executions of BARASA were generated from the curation of 20 independent annealing runs. The performance of BARASA on data with artifactual peaks was evaluated using a depleted data set and adding randomly generated cross peaks such that 20% of all C α , Cβ, and CO cross-peaks were artifacts. Each artifact peak was generated in the following manner. A random residue from the protein sequence, containing an amide group and desired peak type (C α , Cβ, CO, C(i-1), Cβ(i-1), or CO(i-1)) was chosen. Chemical shifts for each dimension of the cross peak were randomly generated from a Gaussian distribution with a mean and standard deviation equal to the mean and standard deviation value of that atom of that residue type in the BMRB. All artifact peaks were given the maximum peak intensity of their peak lists to ensure they would not be cached during the run.

BARASA was implemented in C++ and can be built on all major computing platforms (MacOS, Linux, and Windows). BARASA possesses a command line interface, as well as a GUI implemented using the wxWidgets library and utilizes the Boost libraries. For this study, the simulations were run on 2019 6-core MacBook Pro (Intel processor) with up to 12 annealing runs running in parallel.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

Resonance assignments for IGPS and IL-1Ra have been deposited to the BMRB under accession numbers 51347 , and 51352 , respectively. Cross peak lists and protein sequences for IL-1β, IL-1Ra, IGPS, MBP, Cy1, ecTS, v5domain, and huIDR, which form the foundation of the analysis here, are included in the Source Data file. BMRB statistics used to test BARASA are also included in the Source Data file. Assignments referenced in this study from the BMRB can be accessed via the following accession codes: 434 , 4354 , 19082 , 18927 , and 28135 . The experimental structures referenced in this study from the PDB can be accessed via the following accession codes: 9ILB , 2IRT , 1IGS , 1DMB , 7RY6 , and 1AOB . Supplementary Information is available and consists of fifteen tables and one figure listing resonance assignments made by BARASA, summary statistics of BARASA’s performance using SPARTA+ predicated chemical shifts, AlphaFold2 structural models or in the presence of artifact peaks.  Source data are provided with this paper.

Code availability

BARASA will be made generally available for non-commercial use through, preferably, NMRbox 50 [ https://nmrbox.nmrhub.org/ ] or, less preferred, by contacting [email protected] for Linux or OSX compatible executables.

Ikeya, T. et al. Solution NMR views of dynamical ordering of biomacromolecules. Biochem. Biophys. Acta 1862 , 287–306 (2018).

Article   CAS   Google Scholar  

Shimada, I., Ueda, T., Kofuku, Y., Eddy, M. T. & Wuthrich, K. GPCR drug discovery: integrating solution NMR data with crystal and cryo-EM structures. Nat. Rev. Drug Disc. 18 , 59–82 (2019).

Alderson, T. R. & Kay, L. E. NMR spectroscopy captures the essential role of dynamics in regulating biomolecular function. Cell 184 , 577–595 (2021).

Article   CAS   PubMed   Google Scholar  

Camacho-Zarco, A. R. et al. NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins. Chem. Rev. 122 , 9331–9356 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wüthrich, K. Sequential individual resonance assignments in the 1H-NMR spectra of polypeptides and proteins. Biopolymers 22 , 131–138 (1983).

Article   PubMed   Google Scholar  

Wüthrich, K., Wider, G., Wagner, G. & Braun, W. Sequential resonance assignments as a basis for determination of spatial protein structures by high resolution proton nuclear magnetic resonance. J. Mol. Biol. 155 , 311–319 (1982).

Billeter, M., Braun, W. & Wüthrich, K. Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations. J. Mol. Biol. 155 , 321–346 (1982).

Englander, S. W. & Wand, A. J. Main chain directed strategy for the assignment of 1H NMR spectra of proteins. Biochemistry 26 , 5953–5958 (1985).

Article   Google Scholar  

Di Stefano, D. L. & Wand, A. J. Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. Biochemistry 26 , 7272–7281 (1987).

Wand, A. J. & Nelson, S. J. Refinement of the main chain directed assignment strategy for the analysis of 1H NMR spectra of proteins. Biophys. J. 59 , 1101–1112 (1991).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Nelson, S. J., Schneider, D. M. & Wand, A. J. Implementation of the main chain directed assignment strategy. Computer assisted approach. Biophys. J. 59 , 1113–1122 (1991).

Ikura, M., Kay, L. E. & Bax, A. A novel approach for sequential assignment of 1H, 13C, and 15N spectra of larger proteins: Heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29 , 4659–4667 (1990).

Montelione, G. T. & Wagner, G. Conformation-independent sequential NMR connections in polypeptides by H1-C13-N15 triple-resonance experiments. J. Magn. Reson. 87 , 183–188 (1990).

ADS   CAS   Google Scholar  

Driscoll, P. C., Marius Clore, G., Marion, D., Wingfield, P. T. & Gronenborn, A. M. Complete resonance assignment for the polypeptide backbone of interleukin 1ß using three-dimensional heteronuclear NMR spectroscopy. Biochemistry 29 , 3542–3556 (1990).

Sattler, M., Schleucher, J. & Griesinger, C. Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog. NMR Spectr. 34 , 93–158 (1999).

Frueh, D. P. Practical aspects of NMR signal assignment in larger and challenging proteins. Prog. NMR Spectr. 78 , 47–75 (2014).

Gardner, K. H. & Kay, L. E. The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct. 27 , 357–406 (1998).

Palmer, A. G. Chemical exchange in biomacromolecules: past, present, and future. J. Magn. Reson. 241 , 3–17 (2014).

Tjandra, N. & Bax, A. Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278 , 1111–1114 (1997).

Article   ADS   CAS   PubMed   Google Scholar  

Salmon, L. & Blackledge, M. Investigating protein conformational energy landscapes and atomic resolution dynamics from NMR dipolar couplings: A review. Rep. Prog. Phys. 78 , 126601–126630 (2015).

Article   ADS   PubMed   Google Scholar  

Clore, G. M. & Gronenborn, A. M. Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination. Prog. NMR Spectr. 23 , 43–92 (1991).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Moseley, H. N. B., Monleon, D. & Montelione, G. T. Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Methods Enzymol. 339 , 91–108 (2001).

Baran, M. C., Huang, Y. J., Moseley, H. N. B. & Montelione, G. T. Automated analysis of protein NMR assignments and structures. Chem. Rev. 104 , 3541–3555 (2004).

Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Nat. Acad. Sci. USA 94 , 12366–12371 (1997).

Hitchens, T. K., Lukin, J. A., Zhan, Y., McCallum, S. A. & Rule, G. S. MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. Nmr. 25 , 1–9 (2003).

Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220 , 671–680 (1983).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21 , 1087–1092 (1953).

Article   ADS   CAS   MATH   Google Scholar  

Clubb, R. T., Thanabal, V. & Wagner, G. A constant-time three-dimensional triple-resonance pulse scheme to correlate intraresidue 1HN, 15N, and 13C′ chemical shifts in 15N13C-labelled proteins. J. Man. Reson. 97 , 213–217 (1992).

Article   ADS   CAS   Google Scholar  

Grzesiek, S. & Bax, A. Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J. Magn. Reson. 96 , 432–440 (1992).

Bax, A. & Ikura, M. An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the α-carbon of the preceding residue in uniformly15N/13C enriched proteins. J. Biomol. Nmr. 1 , 99–104 (1991).

Wittekind, M. & Mueller, L. HNCACB, a high-sensitivity 3D NMR experiment to correlate amide-proton and nitrogen resonances with the alpha- and beta-carbon resonances in proteins. J. Magn. Reson. Ser. B 101 , 201–205 (1993).

Grzesiek, S. & Bax, A. Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114 , 6291–6293 (1992).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Han, B., Liu, Y., Ginzinger, S. W. & Wishart, D. S. SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. Nmr. 50 , 43–57 (2011).

Kjaergaard, M. & Poulsen, F. M. Sequence correction of random coil chemical shifts: Correlation between neighbor correction factors and changes in the Ramachandran distribution. J. Biomol. Nmr. 50 , 157–165 (2011).

Kjaergaard, M., Brander, S. & Poulsen, F. M. Random coil chemical shift for intrinsically disordered proteins: Effects of temperature and pH. J. Biomol. Nmr. 49 , 139–149 (2011).

Shen, Y. & Bax, A. SPARTA plus: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. Nmr. 48 , 13–22 (2010).

Hyberts, S. G., Milbradt, A. G., Wagner, A. B., Arthanari, H. & Wagner, G. Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling. J. Biomol. NMR 52 , 315–327 (2012).

Lee, W. et al. I-PINE web server: an integrative probabilistic NMR assignment system for proteins. J. Biomol. Nmr. 73 , 213–222 (2019).

Mishra, S. H. et al. Global protein dynamics as communication sensors in peptide synthetase domains. Sci. Adv. 8 , eabn6549 (2022).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Nerli, S., De Paula, V. S., McShan, A. C. & Sgourakis, N. G. Backbone-independent NMR resonance assignments of methyl probes in large proteins. Nat. Commun. 12 , 691–691 (2021).

Xu, Y. & Matthews, S. MAP-XSII: an improved program for the automatic assignment of methyl resonances in large proteins. J. Biomol. NMR 55 , 179–187 (2013).

Chao, F. A. et al. FLAMEnGO 2.0: an enhanced fuzzy logic algorithm for structure-based assignment of methyl group resonances. J. Magn. Reson 245 , 17–23 (2014).

Monneau, Y. R. et al. Automatic methyl assignment in large proteins by the MAGIC algorithm. J. Biomol. NMR 69 , 215–227 (2017).

Pritisanac, I., Wurz, J. M., Alderson, T. R. & Guntert, P. Automatic structure-based NMR methyl resonance assignment in large proteins. Nat. Commun. 10 , 4922 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Pritisanac, I. et al. Automatic assignment of methyl-NMR spectra of supramolecular machines using graph theory. J. Am. Chem. Soc. 139 , 9523–9533 (2017).

Delaglio, F. et al. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. Nmr. 6 , 277–293 (1995).

Maciejewski, M. W. et al. NMRbox: A resource for biomolecular NMR computation. Biophys. J. 112 , 1529–1534 (2017).

Lee, W., Tonelli, M. & Markley, J. L. NMRFAM-SPARKY: Enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31 , 1325–1327 (2015).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Berman, H. M. et al. The protein data bank. Nucleic Acid Res. 28 , 235–242 (2000).

Gardner, K. H. Solution NMR studies of a 42 KDa Escherichia coli maltose binding protein/β-cyclodextrin complex: Chemical shift assignments and analysis. J. Am. Chem. Soc. 120 , 11738–11748 (1998).

Sapienza, P. J. & Lee, A. L. Backbone and ILV methyl resonance assignments of E. coli thymidylate synthase bound to cofactor and a nucleotide analogue. Biomol. NMR Assign. 8 , 195–199 (2014).

Yang, Y. & Igumenova, T. I. The C-Terminal V5 domain of protein kinase Cα Is intrinsically disordered, with propensity to associate with a membrane mimetic. PLoS ONE 8 , 65699–65699 (2013).

Article   ADS   Google Scholar  

Camacho-Zarco, A. R. et al. Molecular basis of host-adaptation interactions between influenza virus polymerase PB2 subunit and ANP32A. Nat. Commun. 11 , 3656 (2020).

Download references

Acknowledgements

We are grateful to Dominque Frueh and colleagues for providing crosspeak lists for Cy1 and for fruitful discussions and to Andrew Lee, Martin Blackledge and Tatyana Igumenova for providing crosspeak and/or spin system lists for ecTS, hIDD and V5dm, respectively. We also thank the Texas A&M High Performance Research Computing Center for access to computational resources for the prediction of the Cy1 structure and to NMRbox for access to NMRPipe and other data processing packages. This work was supported by grants from the Mathers Foundation (MF-1809-00155), the National Institutes of Health (GM129076) and Texas A&M University to A.J.W. and by a postdoctoral fellowship from the Gulf Coast Consortium provided by the Cancer Prevention and Research Institute of Texas (RP210043) to A.C.B.

Author information

Authors and affiliations.

Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX, 77843, USA

Anthony C. Bishop, Glorisé Torres-Montalvo, Kyle Mimun & A. Joshua Wand

Graduate Group in Biochemistry & Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19014, USA

Sravya Kotaru & A. Joshua Wand

Department of Chemistry, Texas A&M University, College Station, TX, 77843, USA

A. Joshua Wand

Department of Molecular & Cellular Medicine, Texas A&M University, College Station, TX, 77843, USA

You can also search for this author in PubMed   Google Scholar

Contributions

A.C.B. and A.J.W. conceived the algorithm. A.C.B. wrote the computer code to implement the algorithm. A.C.B., S.K., G.T.-M., and A.J.W. tested BARASA. A.C.B., S.K., and G.T.-M. prepared isotopically enriched protein, collected, processed and analyzed NMR data for IL-1β, IGPS, and IL-1Ra, respectively. S.K. and G.T.-M. manually assigned IGPS and IL-1Ra, respectively. K.M prepared isotopically enriched MBP and analyzed MBP NMR data. A.C.B. collected and processed MBP NMR data. A.C.B and G.T.-M ran the test cases through FLYA, AutoAssign and I-PINE. A.C.B. and A.J.W. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to A. Joshua Wand .

Ethics declarations

Competing interests.

The authors declare the following competing interests. Texas A&M AgriLife has secured federal copyright of BARASA and will market the program. As inventors, A.C.B. and A.J.W. will receive a share of royalties generated by commercial use. There are no other competing interests.

Peer review

Peer review information.

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bishop, A.C., Torres-Montalvo, G., Kotaru, S. et al. Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing. Nat Commun 14 , 1556 (2023). https://doi.org/10.1038/s41467-023-37219-z

Download citation

Received : 13 April 2022

Accepted : 06 March 2023

Published : 21 March 2023

DOI : https://doi.org/10.1038/s41467-023-37219-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Breaking boundaries: tinto in poky for computer vision-based nmr walking strategies.

  • Andrea Estefania Lopez Giraldo
  • Zowie Werner
  • Woonghee Lee

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

nmr backbone assignment

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

  • Last updated
  • Save as PDF
  • Page ID 398288

  • Serge L. Smirnov and James McCarty
  • Western Washington University

In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra. In this Chapter we will show how the well-resolved 2D 15 N-HSQC resonances can be assigned to specific residues and chemical groups within protein samples. As an example, we will consider a couple of complementary types of 3D NMR data: HNCACB and CBCA(CO)NH and their joint application for making heteronuclear NMR resonance assignment in proteins. Such an assignment opens a number of ways to probe structure and function (e.g. ligand binding) for the target protein samples.

Learning Objectives

  • Grasp why the resonance assignment of 2D 15 N-HSQC can be beneficial : the case of ligand (drug) binding by a protein (therapeutic target)
  • Familiarize with 3D heteronuclear through-bond (J-coupling) NMR : introduction and case of HNCACB and CBCA(CO)NH pair of 3D experiments
  • Follow an example of assignment of heteronuclear NMR resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ) from a combination of 2D 15 N-HSQC and 3D HNCACB/CBCA(CO)NH

15 N-HSQC as an assay for probing protein – ligand interactions: the need for the NMR resonance assignment

During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity ( K d ). Heteronuclear solution NMR experiments 15 N-HSQC can provide significant insight for such interactions. Let’s recall that most of the signals in this 2D NMR spectra originate from backbone H-N amide groups and some (minority) from the side chain NH and NH 2 groups. The position of 15 N-HSQC resonances are defined by the 1 H N and 15 N H chemical shift values, which in tern depend on the local electronic environment. Ligand binding changes such an environment for the residues forming the binding site even if the tertiary structure of the rest of the protein does not get perturbed. In such a case, the 15N-HSQC resonance pattern undergoes local changes: only the resonances representing NH groups involved in the binding site change their position significantly (>0.05 ppm in 1 H and/or >0.2 ppm in 15 N dimension) or signal intensity (including peak disappearance). Figure VI.2.A illustrates such a change.

Figure_VI.2.Ab_.png

Importantly, every 15 N-HSQC resonance in Figure VI.2.A is labeled with a single letter to help identify specific peaks which undergo spectral changes upon ligand binding. This data could have much greater impact if the peaks which underwent the most pronounced changes in position and/or intensity were assigned to specific amino acid residues within the polypeptide and chemical groups within those residues (backbone vs. side chain). The rest of this Chapter demonstrates some of the fundamentals of the heteronuclear NMR resonance assignment methodology.

Heteronuclear 3D NMR introduction: CBCA(CO)NH spectrum as an example

Just like every 2D 15N-HSQC resonance reports a J-coupling via a covalent bond between an 15N and 1H spin-½ nuclei, there are 3D NMR experiments which report resonances originating from J-coupling (through-bond) of three types of spin-½ nuclei ( 1 H, 13 C, 15 N). In this section we will introduce two such types of 3D NMR data: HNCAB and CBCA(CO)NH. In order to produce a protein sample with nearly complete uniform labeling with 13 C and 15 N isotopes, bacterial recombinant protein expression can be performed in a minimal media supplemented with 13 C-labeled glucose and 15 N-labeled ammonium chloride as the sole sources of carbon and nitrogen respectively. Figure VI.2.B introduces a general concept of a 3D NMR data and shows an element of 3D CBCA(CO)NH spectrum.

Figure_VI.2.Bd_.png

Each resonance (“cross-peak”) of a 3D CBCA(CO)NH spectrum indicates a through-bond (J-coupling scalar) interaction between two atoms of the backbone amide group ( 1 H N and 15 N H ) or residue j and Cα and Cβ nuclei ( 13 C) of preceding residue j -1. The name of the experiment, CBCA(CO)NH refers to the specific spin-½ nuclei involved (and not involved) in relevant J-coupling interactions: Cβ and Cα are J-coupled to NH while the connecting carbonyl carbon is not reporting any NMR signal (although its magnetization state is affected during the experiment). Two types of residues generate special CBCA(CO)NH peak pattern: prolines have no amide proton, so they do not have CBCA(CO)NH peaks linked with their amide groups. Glycine residues have no Cβ, therefore for any residue following a glycine only a single CBCA(CO)NH resonance will be observed (from glycine NH to previous Cα).

The NMR resonance assignment: combined use of two complementary datasets HNCACB and CBCA(CO)NH

By itself, CBCA(CO)NH does not convey much of sequential information. Another heteronuclear 3D NMR dataset, HNCACB, affords a powerful complement here. Just like CBCA(CO)NH, HNCACB reports resonances originating from J-coupling between backbone amide group and Cα / Cβ nuclei. The difference is that HNCACN reports two additional peaks, all intra-residual: between HN and Cα a Cβ spins ( Figure VI.2.C ).

Figure_VI.2.Clast_.png

Typically, HNCACB and CBCA(CO)NH are acquired with identical parameters including spectral width in all three dimensions and the same number of data points in the 15 N dimension (or 15 N planes as on panel B of Figure VI.2.B ) Now, let’s imagine that we go through every 15 N plane and build the pairs of “residue j / residue j -1″ HNCAB/CBCA(CO)NH peaks. This does not give us the sequence-specific NMR resonance assignments yet but already creates such pairs of 3D cross-peaks linked to di-peptides within the sequence. Now, let’s take into account that for some types of residues their 13Cα and 13Cβ chemical shift values differs remarkably from those from other residue types. For details, take a look at BMRB chemical shift statistics for amino acid residues with emphasis on Gly, Ala, Ser, Thr. Knowing where such residues are positioned within the polypeptide sequence, we can start “connecting the dots” by mapping HNCACB/CBCA(CO)NH planes and di-peptides on actual amino acid sequence.

Figure_VI.2.D.png

Figure VI.2.D provides a general idea of how the two 3D NMR experiments HNCACB and CBCA(CO)NH can be utilized together to map the signals on the amino acid sequence of a protein sample. The C of Ala residues typically has chemical shift values below 20.0 ppm, which is unique. This allows identification of Ala patterns HNCACB/CBCA(CO)NH spectral patters. Starting from this starting points (as well from other distinct values, e.g. Cα for Gly and Cβ for Ser/Thr), one can continue “connecting the dots” process outlined in Figure VI.2.D to cover the entire sequence. If these two 3D NMR datasets encounter resonance overlaps, which are impossible to resolve, more 3D NMR dataset pairs are utilized in a similar way, e.g. HNCO/HN(CA)CO and others. This process allows assignment to specific residues and chemical groups of nearly all backbone and some side-chain resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ). Methods for assigning side-chain chemical shift values are not discussed in this chapter but conceptually they are similar to the ones described here.

With the general process of the protein NMR resonance assignment described, let’s assume that this method was successfully applied to the protein target (T) sample presented in Figure VI.2.A. The resonance assignment completion allows one to replace letter labels with residue-number labels (similar to the ones used in Figure VI.2.D). This in turn allows one to determine the specific residues affected directly or allosterically by binding of the ligand (L) to the target. In many cases, such information together with other data leads to the determination of the ligand binding residues within the target. If the ligand is a candidate therapeutic agent, identification of the ligand binding residues greatly advances ensuing efforts to optimize the drug.

Example \(\PageIndex{1}\)

Analyze Figure VI.2.A and list at least two resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Major spectral changes for this model spectrum include resonances moving by >0.05 ppm in 1 H or >0.2 ppm in 15 N dimensions as well as peak disappearance (peak intensity going down to zero).

Upon ligand L binding target protein (T), resonance f disappears and resonance s moves by >0.05 ppm in 1 H dimension.

Example \(\PageIndex{2}\)

Inspect BMRB entry 50205 and list all the heteronuclear NMR datasets utilized for the NMR resonance assignment.

BMRB entry 50205 contains the chemical shift assignment data for the target sample and offers several ways to look at its underlying NMR data including the list of experiments used to perform the NMR resonance assignment and the chemical shift values. E.g., the NMR-STAR v3 text file has a section titled _Experiment_list, which sums up the heteronuclear NMR data types used for making the assignments: 2D 1 H- 15 N HSQC and 3D HNCACB, CBCA(CO)NH, HNCO and HN(CA)CO.

Example \(\PageIndex{3}\)

How many 3D HNCACB resonances would you expect to originate from a Lys residue which is preceded by a Met?

four as both Lys and Met have backbone amide (HN) groups and both have Cα and Cβ atoms.

Practice Problems

Problem 1 . Analyze Figure VI.2.A and list all the resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Example 1 above will help you start the analysis.

Problem 2 . From BMRB entry linked to PDB 5VNT, list all the heteronuclear NMR datasets utilized for the NMR resonance assignment for the target sample.

Problem 3 . Let’s consider panel B of Figure VI.2.B . Imagine that the 13 C dimension is taken out of the spectrum (all 13 C planes are collapsed together). What type of 2D spectrum will remain after such a dimension reduction?

Problem 4 . How many 3D HNCACB resonances would you expect to originate from a Gly residue which is preceded by a Pro?

Problem 5 . How many 3D HNCACB resonances would you expect to originate from a Pro residue which is preceded by a Gly?

Problem 6* . Look up the amino acid NMR chemical shift values statistics table presented with BMRB repository and list the average values for the following resonances: 15 N, 13 Cα and 13 Cβ for Gly, Ala, Tyr, Glu, Arg, Ser, Thr, Pro. From this analysis, suggest what types of residues tend to report unusually low or high chemical shift values in comparison with the rest of the amino acids?

  • Documentation

Welcome to CARA

This is the official website of CARA ( C omputer A ided R esonance A ssignment). CARA is a software application for the analysis of NMR spectra and computer aided resonance assignment which is particularly suited for biomacromolecules. Dedicated tools for backbone assignment, side chain assignment, and peak integration support the entire process of structure determination. CARA was developed in Professor Kurt Wüthrich's group . Continuing development and support is provided by a group of volunteers .

CARA is free software (see licence terms ). Precompiled native executables are provided on all major platforms for easy installation and operation.

CARA can be downloaded from the CARA Downloads page.

CARA documentation is available from the CARA Documentation page.

  • Support site for tutorials, templates, scripts, FAQs, etc.
  • CARA Forum for anouncements, support or feature requests
  • Who is using CARA See our list of users and projects.

You are here: Home

  • Show pagesource
  • Old revisions

Personal Tools

  • home.txt · Last modified: 2016/07/26 23:52 by rkeller

NMR sample optimization and backbone assignment of a stabilized neurotensin receptor

Affiliations.

  • 1 Bavarian NMR Center (BNMRZ) and Structural Membrane Biochemistry, Dept. of Bioscience, TUM School of Natural Sciences, Technical University of Munich, 85748 Garching, Germany.
  • 2 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
  • 3 Bavarian NMR Center (BNMRZ) and Structural Membrane Biochemistry, Dept. of Bioscience, TUM School of Natural Sciences, Technical University of Munich, 85748 Garching, Germany; Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich 85764 Neuherberg, Germany. Electronic address: [email protected].
  • PMID: 37142193
  • PMCID: PMC10242673 (available on 2024-06-01 )
  • DOI: 10.1016/j.jsb.2023.107970

G protein-coupled receptors (GPCRs) are involved in a multitude of cellular signaling cascades and consequently are a prominent target for pharmaceutical drugs. In the past decades, a growing number of high-resolution structures of GPCRs has been solved, providing unprecedented insights into their mode of action. However, knowledge on the dynamical nature of GPCRs is equally important for a better functional understanding, which can be obtained by NMR spectroscopy. Here, we employed a combination of size exclusion chromatography, thermal stability measurements and 2D-NMR experiments for the NMR sample optimization of the stabilized neurotensin receptor type 1 (NTR1) variant HTGH4 bound to the agonist neurotensin. We identified the short-chain lipid di-heptanoyl-glycero-phosphocholine (DH 7 PC) as a promising membrane mimetic for high resolution NMR experiments and obtained a partial NMR backbone resonance assignment. However, internal membrane-incorporated parts of the protein were not visible due to lacking amide proton back-exchange. Nevertheless, NMR and hydrogen deuterium exchange (HDX) mass spectrometry experiments could be used to probe structural changes at the orthosteric ligand binding site in the agonist and antagonist bound states. To enhance amide proton exchange we partially unfolded HTGH4 and observed additional NMR signals in the transmembrane region. However, this procedure led to a higher sample heterogeneity, suggesting that other strategies need to be applied to obtain high-quality NMR spectra of the entire protein. In summary, the herein reported NMR characterization is an essential step toward a more complete resonance assignment of NTR1 and for probing its structural and dynamical features in different functional states.

Copyright © 2023 Elsevier Inc. All rights reserved.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Magnetic Resonance Spectroscopy
  • Receptors, G-Protein-Coupled
  • Receptors, Neurotensin* / genetics
  • Receptors, Neurotensin* / metabolism
  • Receptors, Neurotensin

Grants and funding

  • R01 GM129026/GM/NIGMS NIH HHS/United States

nmr backbone assignment

Biomolecular NMR Assignments

  • Provides an avenue for depositing these data into a public database at BioMagResBank.
  • Assignment Notes are published in biannual editions in June and December.
  • No page charges or fees for online color images.
  • Optional color images in print and open access publication fees apply.
  • Christina Redfield

nmr backbone assignment

Latest issue

Volume 17, Issue 2

Latest articles

Backbone and methyl side-chain resonance assignments of the single chain fab fragment of trastuzumab.

  • Donald Gagné
  • James M. Aramini

nmr backbone assignment

1 H, 13 C, and 15 N resonance assignments of the La Motif of the human La-related protein 1

  • Benjamin C. Smith
  • Robert Silvers

nmr backbone assignment

1 H, 15 N and 13 C resonance backbone and side-chain assignments and secondary structure determination of the BRCT domain of Mtb LigA

  • Jayanti Vaishnav
  • Ravi Sankar Ampapathi

nmr backbone assignment

Chemical shift assignment of dsRBD1 and dsRBD2 of Arabidopsis thaliana DRB3, an essential protein involved in RNAi-mediated antiviral defense

  • Jaydeep Paul
  • Mandar V. Deshmukh

nmr backbone assignment

Solution NMR chemical shift assignment of apo and molybdate-bound ModA at two pHs

  • Hiep LD Nguyen
  • Karin A. Crowhurst

nmr backbone assignment

Journal information

  • Biological Abstracts
  • Chemical Abstracts Service (CAS)
  • Google Scholar
  • INIS Atomindex
  • Japanese Science and Technology Agency (JST)
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Springer policies

© Springer Nature B.V.

  • Find a journal
  • Publish with us
  • Track your research

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Entire Site
  • Research & Funding
  • Health Information
  • About NIDDK
  • Research at NIDDK
  • Labs & Branches
  • Laboratory of Chemical Physics
  • Biophysical Nuclear Magnetic Resonance Spectroscopy Section

Ad Bax Group Software

The Ad Bax Group makes available downloadable software for nuclear magnetic resonance (NMR) research.

Our software is available behind our new, NIH-required login. Multiple login options—including Google and Microsoft—are freely available.

NIH-required Login

Available software.

  • NMRPipe : Multidimensional spectral processing and analysis of NMR data
  • TALOS : Prediction of Protein Phi and Psi Angles Using a Chemical Shift Database
  • ACME : Measurement of homonuclear proton couplings from regular 2D COSY spectra
  • SSIA : Simulation of Sterically Induced Alignment Tensor
  • PALES : Prediction of ALignmEnt from Structure
  • EHM : Extended Histogram Method for Analysis of Dipolar Couplings
  • HBDB : Database Hydrogen-Bonding Potential for Protein Structure Refinement
  • SAXS : Refinement of Protein Structures Against Small-Angle X-Ray Scattering Data
  • SPARTA : Prediction of Backbone Chemical Shifts from Known Protein Structure
  • CS-ROSETTA : Chemical Shifts Based Protein Structure Prediction Using ROSETTA
  • IDIDC : Iterative DIDC analysis of RDCs
  • FastSAXS : Fast refinement of macromolecular structures against solution x-ray scattering data
  • TALOS+ : A Hybrid method for predicting protein backbone torsion angles from NMR chemical shifts
  • PROMEGA : Prediction of Xaa-Pro peptide bond conformation from sequence and chemical shifts
  • SPARTA+ : Improved Prediction of Backbone Chemical Shifts from Known Protein Structure
  • MICS : Identification of Helix Capping and Beta-turn Motifs from NMR Chemical Shifts
  • VW_fit : Optimization of weights of individual structural models in structural ensemble to achieve best fit for RDCs in multiple alignment media
  • TALOS-N : Protein Backbone and Sidechain Torsion Angles Predicted from NMR Chemical Shifts Using Artificial Neural Networks
  • POMONA : Chemical Shift Homology Modeling using Protein alignments Obtained by Matching Of NMR Assignments
  • MERA : Backbone Torsion Angle Distributions Evaluation in Dynamic and Disordered Proteins from NMR Data
  • SMILE : Sparse Multidimensional Iterative Lineshape-Enhanced (SMILE) Reconstruction of Both Non-Uniformly Sampled and Conventional NMR Data
  • random coil J(HNHa) : IDP random coil 3J(HNHa) coupling constants prediction

Use multiple login options—including Google, Microsoft, or NIH account—to access the Ad Bax Group software.

Click on the “Access Software” button.

NIH badge holders can use the “Smart Card Login” or “Authenticator App.”

Members of the public can scroll to the bottom of the page below the “Authenticator App” box and

  • click on a preferred login option,
  • follow the prompts to enter credentials, and
  • confirm sharing account name with NIH.

Screenshot of the NIH Login page, highlighting the top section for NIH staff login and the bottom section for public login

Protein NMR

A practical guide, double resonance backbone assignment.

For smaller proteins, it is possible to do the backbone assignment using just 15 N-labelled protein. The spectra used for this are the 15N-NOESY-HSQC and the 15N-TOCSY-HSQC . The 15N-NOESY-HSQC will show for each NH group all 1 H resonances which are within about 5-7Å of the NH hydrogen. Assignment is done on the assumption that the two neighbouring NH groups are always visible. Thus two NH groups can be linked because they each have an NOE to the other NH group.

Note that you always end up with a square motif between strips which are linked by an NOE: each strip has an NOE to the diagonal peak of the other strip.

Helical sections are generally easier to assign, as NOEs from NH(i) are visible not only to NH(i±1), but also to NH(i±2) and sometimes NH(i±3).

β-sheet structures include short NH-NH distances between the strands. This means that in addition to the NH(i±1) NOEs, a strong cross-strand NOE is also observed.

Having a rough idea of the secondary structure and topology of the protein can thus significantly aid backbone assignment using double resonance spectra only. Further help with assignment is provided by the 15N-TOCSY-HSQC. This should show links from the backbone NH group to all side-chain hydrogens of that residue. Using this spectrum the amino acid type can be identified or narrowed down significantly. The side-chain NOEs from the 15N-NOESY-HSQC can also be useful during the assignment process, as NH(i)-Hα(i-1) are generally very strong, in particular in β-sheet sections.

IMAGES

  1. Sketch of backbone assignment procedure through heteronuclear NMR. (1

    nmr backbone assignment

  2. Double Resonance Backbone Assignment

    nmr backbone assignment

  3. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    nmr backbone assignment

  4. NMR backbone assignment and secondary structure determination of OmpX

    nmr backbone assignment

  5. NMR Backbone Assignment of Large Proteins by Using 13Cα‐Only Triple

    nmr backbone assignment

  6. New solid state nmr method for protein backbone assignment

    nmr backbone assignment

VIDEO

  1. bhic 134 previous year solve paper

  2. Background ASMR (NO Mid-Roll NO Talking for Study, Work, Gaming)

  3. NPTEL Swayam Advanced NMR Techniques in Solution and Solid-State Week-1 Assignment Answers| NPTEL

  4. Simulating a practice set of protein NMR spectra by POKY (Abigail Chiu)

  5. ACD/2D NMR Manager Manual Structure Assignment (Przypisanie strukury)

  6. NMR Spectroscopy

COMMENTS

  1. Robust automated backbone triple resonance NMR assignments of ...

    The combination of fast and robust backbone resonance assignments with structure-based methyl resonance assignments 43,44,45,46,47,48 will reduce the resonance assignment barrier considerably and ...

  2. Triple Resonance Backbone Assignment

    Standard triple resonance backbone assignment of proteins is based on the CBCANNH and CBCA (CO)NNH spectra. The idea is that the CBCANNH correlates each NH group with the Cα and Cβ chemical shifts of its own residue (strongly) and of the residue preceding (weakly). The CBCA (CO)NNH only correlates the NH group to the preceding Cα and Cβ ...

  3. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    Figure VI.2.D An example of combined use of HNCACB and CBCA(CO)NH experiments for the backbone NMR resonance assignment in proteins. Cα and Cα labels are color coded: blue for intra-residual signals and green for preceding carbons (Cα-1, Cβ-1). HNCACB contours are color-coded: black for positive signals (Cα) and red for negative ones (Cβ).

  4. Practical aspects of NMR signal assignment in larger and challenging

    Backbone assignment is often considered completed when all signals in an H-N correlation map have been assigned. For small proteins, with little overlap, the first step in defining spin systems consists in peak picking all H/N signals in an HSQC spectrum. ... Many NMR assignment software packages feature automated routines for performing this ...

  5. Automatic Assignment

    This automatic backbone assignment programme uses chemical shifts from 3D assignment spectra and secondary structure prediction as its input. ... AutoAssign is an artificial intelligence package for automating the analysis of backbone resonance assignments using triple-resonance NMR spectra of proteins. The new AutoAssign distribution automates ...

  6. NMR Backbone Assignment of Large Proteins by Using 13Cα‐Only Triple

    Nuclear magnetic resonance (NMR) is a powerful tool to interrogate protein structure and dynamics residue by residue. However, the prerequisite chemical-shift assignment remains a bottleneck for large proteins due to the fast relaxation and the frequency degeneracy of the 13 C α nuclei. Herein, we present a covariance NMR strategy to assign the backbone chemical shifts by using only HN(CO)CA ...

  7. Assignment Practice

    Assignment Practice. This section describes how the assignment principles described under Assignment Theory can be but put into practice using the CCCPNmr Analysis software. There are several ways in which triple resonance backbone assignment, in particular, can be approached in CCPNmr Analysis using more or less automated methods.

  8. APSY-NMR for protein backbone assignment in high-throughput structural

    A standard set of three APSY-NMR experiments has been used in daily practice to obtain polypeptide backbone NMR assignments in globular proteins with sizes up to about 150 residues, which had been identified as targets for structure determination by the Joint Center for Structural Genomics (JCSG) under the auspices of the Protein Structure Initiative (PSI). In a representative sample of 30 ...

  9. Backbone-independent NMR resonance assignments of methyl probes in

    In the conventional approach, backbone assignments are first established using triple-resonance experiments 5. Then, methyl resonances are connected to the backbone using either methyl out-and-back experiments 6 or, more commonly, using 15 N and 13 C edited amide-to-methyl nuclear Overhauser effect (NOE) measurements 7.

  10. Protein NMR Resonance Assignment

    The establishment of the sequential assignment procedure without depending on the existing three-dimensional (3D) structures was, therefore, a milestone for the protein NMR. Backbone amide proton (1 H N) and α proton (1 H α) signals were sequentially assigned based on the distance information between 1 H N i and 1 H α i − 1 and between 1 H ...

  11. NMR Backbone Assignment of VIM-2 and Identification of the Active

    The NMR backbone resonance assignment for the metallo-β-lactamase VIM-2 (84%) is disclosed, providing the basis for rational development of a clinically applicable inhibitor, which will be a long sought tool in fighting antibiotic resistance.

  12. Protein NMR Resonance Assignment

    Therefore, the establishment of the sequential assignment procedure was a mile stone for the protein NMR. Backbone amide proton (H N) and α proton (H α) signals were sequentially assigned based on the distance information between H N i and \({\rm H}^{\alpha}_{{\rm i}-1}\), and were aligned on the amino acid sequence of the particular protein ...

  13. Home

    CARA is a software application for the analysis of NMR spectra and computer aided resonance assignment which is particularly suited for biomacromolecules. Dedicated tools for backbone assignment, side chain assignment, and peak integration support the entire process of structure determination. CARA was developed in Professor Kurt Wüthrich's ...

  14. Assignment Theory

    The most simple and straight forward method of backbone resonance assignment involves the use of 15 N, 13 C labelled protein and the measurement of CBCANNH and CBCA(CO)NNH spectra.. Large Proteins. Large proteins give worse NMR spectra, because they tumble more slowly.

  15. NMR sample optimization and backbone assignment of a stabilized

    We identified the short-chain lipid di-heptanoyl-glycero-phosphocholine (DH 7 PC) as a promising membrane mimetic for high resolution NMR experiments and obtained a partial NMR backbone resonance assignment. However, internal membrane-incorporated parts of the protein were not visible due to lacking amide proton back-exchange.

  16. Home

    Biomolecular NMR Assignments is a dedicated forum for publishing sequence-specific resonance assignments for proteins and nucleic acids. ... 1 H, 15 N and 13 C resonance backbone and side-chain assignments and secondary structure determination of the BRCT domain of Mtb LigA. Jayanti Vaishnav; Ravi Sankar Ampapathi;

  17. Ad Bax Group Software

    Email: [email protected]. Phone:1-800-860-8747. TTY:711. Chat. Live Chat Available. 8:30 a.m - 5 p.m. ET. Monday - Friday. Follow Us. NMR-related software managed by the Biophysical Nuclear Magnetic Resonance Spectroscopy Section of the Laboratory of Chemical Physics at NIDDK.

  18. Protein NMR

    Much space and discussion is devoted to practical aspects. The implementation of protein NMR assignment is described using the program CCPNmr Analysis. This program has been developed by CCPN and actively seeks input from the NMR community. CCPNmr Analysis is based on the detailed and well thought-out CCPN Data Model which has the advantage (a ...

  19. Fast collective motions of backbone in transmembrane α ...

    ssNMR chemical shift assignments of backbone 1 H N of AqpZ in proteoliposomes. In a previous study, ... H. Xie, Y. Zhao, J. Wang, Z. Zhang, J. Yang, Solid-state NMR chemical shift assignments of aquaporin Z in lipid bilayers. Biomol. NMR Assign. 12, 323-328 (2018). Crossref.

  20. Double Resonance Backbone Assignment

    Double Resonance Backbone Assignment. For smaller proteins, it is possible to do the backbone assignment using just 15 N-labelled protein. The spectra used for this are the 15N-NOESY-HSQC and the 15N-TOCSY-HSQC. The 15N-NOESY-HSQC will show for each NH group all 1 H resonances which are within about 5-7Å of the NH hydrogen.