• Login & order NMR service now
  • NMR service Login & order NMR service now NMR service NMR chromatography service Why use our superior service Contact us The NMR team How to submit samples Use the instruments yourself Terms & conditions
  • Assignment of 1H-NM…
  • What is NMR What is NMR Uses of NMR Basis of NMR Chemical shift Spin-spin coupling
  • Techniques Techniques 1 H NMR 2D NMR Relaxation Multinuclear Semi-solids Solid state
  • Apps Apps Solvent shifts NMR thermometer Reference frequency
  • Guides Guides Measuring a 1 H spectrum on the old 500 Measuring a 1 H spectrum Measuring other nuclei Measuring 2D NMR Measuring diffusion Measuring relaxation Measuring solid & semi-solid
  • Terms & conditions

Assignment of 1 H-NMR spectra

On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo[g]chrysene and cholesteryl acetate .

In the example in fig. 1 of isopropyl- β -D-thiogalactopyranoside (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), all the hydroxyls have been exchanged with the deuterium oxide solvent to deuteroxyls. Therefore, the hydroxyl signals do not appear in the spectrum and do not couple with the other signals, making the spectrum simpler.

Fig. 1. 1 H-NMR spectrum of isopropyl- β -D-thiogalactopyranoside in D 2 O

From the integrals, we see that there are two multiples of three, one of which has tall sharp signals so very likely corresponds to the two methyl (CH 3 ) signals. The remaining signals are expected to yield integrals of one so the integrals of three and four are overlapping. H6 is expected to yield two separate signals because they are diasteriomeric (if one of them is exchanged with another group, the attached carbon would be optically active. This fact affects their chemical shift and they differ magnetically - If you don't understand this, don't worry, just take it form granted for now).

From the chemical shifts we see that what we suspect are methyls have the appropriate chemical shift and the remaining signals fall in the overlapping CH and CH 2 regions as expected. If you are an experienced sugar chemist you will know that the signal with the highest chemical shift is usually the anomeric signal (H1 – the hydrogen connected to the carbon next to the sugar ring-closing oxygen).

The coupling patterns can be used to continue the analysis. You could be forgiven for thinking that the methyl signals display an AXY coupling pattern. However, they only couple with the single i Pr proton so should yield an AX pattern. The reason is that the methyls (labeled MeA and MeB) are diasteriomeric so have different chemical shifts (not magnetically identical, just like the H6 protons). The result is two overlapping AX patterns (fig. 2).

Fig. 2. The methyl doublets of isopropyl- β -D-thiogalactopyranoside in D 2 O

The i Pr proton is coupled to six methyl protons yielding a septet (fig. 3).

Fig. 3. The i Pr septet of isopropyl- β -D-thiogalactopyranoside in D 2 O

The anomeric H1 is coupled to H2 yielding an AX doublet (fig. 4).

Fig. 4. The anomeric H1 doublet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H4 has an unusually small coupling to H5 (this occurs when the two CH bonds are approximately at right-angles to each other), so small that it is not observed in a normal spectrum. So H4 displays an AX pattern instead of the expected AXY pattern although the peaks are slightly broad indicating the missing coupling (fig. 5).

Fig. 5. The H4 multiplet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H3 couples with both H2 and H4 and yields the expected AXY pattern. While H5, H6A and H6B have very similar chemical shifts and stong coupling that combine to yield very strongly second order coupled ABC pattern that is difficult to analyze (fig. 6).

Fig. 6. The H3 and the overlapping H5, H6A and H6B multiplets of isopropyl- β -D-thiogalactopyranoside in D 2 O

In the example of trans -geraniol in fig. 7 (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), proton-5 (H5) is coupled and therefore split by proton-4 (H4); H8 and H9 represent two protons each that are split by each other into triplet AX 2 patterns; and H2 is split into four by the three protons at H1 and the resulting quartet is split again by H3. However, second order coupling distorts the multiplets making the assignment more difficult.

Fig. 7. Part of the 1 H-NMR spectrum of trans -geraniol in CDCl 3

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

  • Last updated
  • Save as PDF
  • Page ID 398288

  • Serge L. Smirnov and James McCarty
  • Western Washington University

In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra. In this Chapter we will show how the well-resolved 2D 15 N-HSQC resonances can be assigned to specific residues and chemical groups within protein samples. As an example, we will consider a couple of complementary types of 3D NMR data: HNCACB and CBCA(CO)NH and their joint application for making heteronuclear NMR resonance assignment in proteins. Such an assignment opens a number of ways to probe structure and function (e.g. ligand binding) for the target protein samples.

Learning Objectives

  • Grasp why the resonance assignment of 2D 15 N-HSQC can be beneficial : the case of ligand (drug) binding by a protein (therapeutic target)
  • Familiarize with 3D heteronuclear through-bond (J-coupling) NMR : introduction and case of HNCACB and CBCA(CO)NH pair of 3D experiments
  • Follow an example of assignment of heteronuclear NMR resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ) from a combination of 2D 15 N-HSQC and 3D HNCACB/CBCA(CO)NH

15 N-HSQC as an assay for probing protein – ligand interactions: the need for the NMR resonance assignment

During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity ( K d ). Heteronuclear solution NMR experiments 15 N-HSQC can provide significant insight for such interactions. Let’s recall that most of the signals in this 2D NMR spectra originate from backbone H-N amide groups and some (minority) from the side chain NH and NH 2 groups. The position of 15 N-HSQC resonances are defined by the 1 H N and 15 N H chemical shift values, which in tern depend on the local electronic environment. Ligand binding changes such an environment for the residues forming the binding site even if the tertiary structure of the rest of the protein does not get perturbed. In such a case, the 15N-HSQC resonance pattern undergoes local changes: only the resonances representing NH groups involved in the binding site change their position significantly (>0.05 ppm in 1 H and/or >0.2 ppm in 15 N dimension) or signal intensity (including peak disappearance). Figure VI.2.A illustrates such a change.

Figure_VI.2.Ab_.png

Importantly, every 15 N-HSQC resonance in Figure VI.2.A is labeled with a single letter to help identify specific peaks which undergo spectral changes upon ligand binding. This data could have much greater impact if the peaks which underwent the most pronounced changes in position and/or intensity were assigned to specific amino acid residues within the polypeptide and chemical groups within those residues (backbone vs. side chain). The rest of this Chapter demonstrates some of the fundamentals of the heteronuclear NMR resonance assignment methodology.

Heteronuclear 3D NMR introduction: CBCA(CO)NH spectrum as an example

Just like every 2D 15N-HSQC resonance reports a J-coupling via a covalent bond between an 15N and 1H spin-½ nuclei, there are 3D NMR experiments which report resonances originating from J-coupling (through-bond) of three types of spin-½ nuclei ( 1 H, 13 C, 15 N). In this section we will introduce two such types of 3D NMR data: HNCAB and CBCA(CO)NH. In order to produce a protein sample with nearly complete uniform labeling with 13 C and 15 N isotopes, bacterial recombinant protein expression can be performed in a minimal media supplemented with 13 C-labeled glucose and 15 N-labeled ammonium chloride as the sole sources of carbon and nitrogen respectively. Figure VI.2.B introduces a general concept of a 3D NMR data and shows an element of 3D CBCA(CO)NH spectrum.

Figure_VI.2.Bd_.png

Each resonance (“cross-peak”) of a 3D CBCA(CO)NH spectrum indicates a through-bond (J-coupling scalar) interaction between two atoms of the backbone amide group ( 1 H N and 15 N H ) or residue j and Cα and Cβ nuclei ( 13 C) of preceding residue j -1. The name of the experiment, CBCA(CO)NH refers to the specific spin-½ nuclei involved (and not involved) in relevant J-coupling interactions: Cβ and Cα are J-coupled to NH while the connecting carbonyl carbon is not reporting any NMR signal (although its magnetization state is affected during the experiment). Two types of residues generate special CBCA(CO)NH peak pattern: prolines have no amide proton, so they do not have CBCA(CO)NH peaks linked with their amide groups. Glycine residues have no Cβ, therefore for any residue following a glycine only a single CBCA(CO)NH resonance will be observed (from glycine NH to previous Cα).

The NMR resonance assignment: combined use of two complementary datasets HNCACB and CBCA(CO)NH

By itself, CBCA(CO)NH does not convey much of sequential information. Another heteronuclear 3D NMR dataset, HNCACB, affords a powerful complement here. Just like CBCA(CO)NH, HNCACB reports resonances originating from J-coupling between backbone amide group and Cα / Cβ nuclei. The difference is that HNCACN reports two additional peaks, all intra-residual: between HN and Cα a Cβ spins ( Figure VI.2.C ).

Figure_VI.2.Clast_.png

Typically, HNCACB and CBCA(CO)NH are acquired with identical parameters including spectral width in all three dimensions and the same number of data points in the 15 N dimension (or 15 N planes as on panel B of Figure VI.2.B ) Now, let’s imagine that we go through every 15 N plane and build the pairs of “residue j / residue j -1″ HNCAB/CBCA(CO)NH peaks. This does not give us the sequence-specific NMR resonance assignments yet but already creates such pairs of 3D cross-peaks linked to di-peptides within the sequence. Now, let’s take into account that for some types of residues their 13Cα and 13Cβ chemical shift values differs remarkably from those from other residue types. For details, take a look at BMRB chemical shift statistics for amino acid residues with emphasis on Gly, Ala, Ser, Thr. Knowing where such residues are positioned within the polypeptide sequence, we can start “connecting the dots” by mapping HNCACB/CBCA(CO)NH planes and di-peptides on actual amino acid sequence.

Figure_VI.2.D.png

Figure VI.2.D provides a general idea of how the two 3D NMR experiments HNCACB and CBCA(CO)NH can be utilized together to map the signals on the amino acid sequence of a protein sample. The C of Ala residues typically has chemical shift values below 20.0 ppm, which is unique. This allows identification of Ala patterns HNCACB/CBCA(CO)NH spectral patters. Starting from this starting points (as well from other distinct values, e.g. Cα for Gly and Cβ for Ser/Thr), one can continue “connecting the dots” process outlined in Figure VI.2.D to cover the entire sequence. If these two 3D NMR datasets encounter resonance overlaps, which are impossible to resolve, more 3D NMR dataset pairs are utilized in a similar way, e.g. HNCO/HN(CA)CO and others. This process allows assignment to specific residues and chemical groups of nearly all backbone and some side-chain resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ). Methods for assigning side-chain chemical shift values are not discussed in this chapter but conceptually they are similar to the ones described here.

With the general process of the protein NMR resonance assignment described, let’s assume that this method was successfully applied to the protein target (T) sample presented in Figure VI.2.A. The resonance assignment completion allows one to replace letter labels with residue-number labels (similar to the ones used in Figure VI.2.D). This in turn allows one to determine the specific residues affected directly or allosterically by binding of the ligand (L) to the target. In many cases, such information together with other data leads to the determination of the ligand binding residues within the target. If the ligand is a candidate therapeutic agent, identification of the ligand binding residues greatly advances ensuing efforts to optimize the drug.

Example \(\PageIndex{1}\)

Analyze Figure VI.2.A and list at least two resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Major spectral changes for this model spectrum include resonances moving by >0.05 ppm in 1 H or >0.2 ppm in 15 N dimensions as well as peak disappearance (peak intensity going down to zero).

Upon ligand L binding target protein (T), resonance f disappears and resonance s moves by >0.05 ppm in 1 H dimension.

Example \(\PageIndex{2}\)

Inspect BMRB entry 50205 and list all the heteronuclear NMR datasets utilized for the NMR resonance assignment.

BMRB entry 50205 contains the chemical shift assignment data for the target sample and offers several ways to look at its underlying NMR data including the list of experiments used to perform the NMR resonance assignment and the chemical shift values. E.g., the NMR-STAR v3 text file has a section titled _Experiment_list, which sums up the heteronuclear NMR data types used for making the assignments: 2D 1 H- 15 N HSQC and 3D HNCACB, CBCA(CO)NH, HNCO and HN(CA)CO.

Example \(\PageIndex{3}\)

How many 3D HNCACB resonances would you expect to originate from a Lys residue which is preceded by a Met?

four as both Lys and Met have backbone amide (HN) groups and both have Cα and Cβ atoms.

Practice Problems

Problem 1 . Analyze Figure VI.2.A and list all the resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Example 1 above will help you start the analysis.

Problem 2 . From BMRB entry linked to PDB 5VNT, list all the heteronuclear NMR datasets utilized for the NMR resonance assignment for the target sample.

Problem 3 . Let’s consider panel B of Figure VI.2.B . Imagine that the 13 C dimension is taken out of the spectrum (all 13 C planes are collapsed together). What type of 2D spectrum will remain after such a dimension reduction?

Problem 4 . How many 3D HNCACB resonances would you expect to originate from a Gly residue which is preceded by a Pro?

Problem 5 . How many 3D HNCACB resonances would you expect to originate from a Pro residue which is preceded by a Gly?

Problem 6* . Look up the amino acid NMR chemical shift values statistics table presented with BMRB repository and list the average values for the following resonances: 15 N, 13 Cα and 13 Cβ for Gly, Ala, Tyr, Glu, Arg, Ser, Thr, Pro. From this analysis, suggest what types of residues tend to report unusually low or high chemical shift values in comparison with the rest of the amino acids?

NMR Methods for Stereochemical Assignments

  • Reference work entry
  • First Online: 01 January 2012
  • Cite this reference work entry

Book cover

  • Kirk R. Gustafson 4  

4276 Accesses

4 Citations

Various NMR techniques have been developed that allow assignment of the relative and absolute configuration of many of the stereogenic carbons that occur in marine natural products. The application of chiral anisotropic reagents in conjunction with NMR analyses has been particularly useful for determining the absolute configuration of secondary alcohols, α-substituted primary amines, and α-substituted carboxylic acids. Derivatization of these functional groups with appropriate chiral reagents (e.g., MTPA) provides diastereomeric products that have diagnostic differences in their 1 H chemical shifts. A recently developed technique known as J -based configurational analysis uses proton–proton couplings and 2- and 3-bond carbon–proton couplings ( 2,3 J CH ) to assign the relative configuration of adjacent (1, 2) stereogenic carbons in conformationally flexible molecules. The J -based method involves comparing experimentally measured scalar couplings and NOE interactions with those predicted from this model to assign the relative configuration of the chiral methines. This technique is also applicable to oxygenated systems since there is a dihedral angle dependence for 2 J CH couplings between a proton and an adjacent carbon that bears an electronegative oxygen substituent. Strategies have also been developed to utilize J -based configurational analysis when there is a methylene separating the two stereogenic methine carbons, and even when conformational interconversion results in the coexistence of two major conformers.

  • Absolute Configuration
  • Relative Configuration
  • Marine Natural Product
  • Electronegative Substituent

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dale JA, Mosher HS (1973) Nuclear magnetic resonance enantiomer reagents. Configurations via nuclear magnetic resonance chemical shifts of diastereomeric mandelate, O -methylmandelate, and α-methoxy-α-trifluoromethyphenylacetate (TPA) esters. J Am Chem Soc 95:512–519

Article   CAS   Google Scholar  

Sullivan GR, Dale JA, Mosher HS (1973) Correlation of configuration and 19 F chemical shifts of α-methoxy-α-trifluoromethyphenylacetate derivatives. J Org Chem 38:2143–2147

Ohtani I, Kusumi T, Kashman Y, Kakisawa H (1991) High-field FT NMR applications of Mosher’s method. The absolute configurations of marine terpenoids. J Am Chem Soc 113:4092–4096

Kusumi T, Ohtani II (1999) Determination of the absolute configuration of biologically active compounds by the modified Mosher’s method. In: Cooper R, Snyder JK (eds) The biology chemistry interface. Marcel Dekker, New York, pp 103–137

Google Scholar  

Seco JM, Quiñoá E, Riguera R (2000) The assignment of absolute configurations by NMR of arylmethoxyacetate derivates: is this methodology being correctly used? Tetrahedron Asymmetry 11:2781–2791

Seco JM, Quiñoá E, Riguera R (2001) A practical guide for the assignment of the absolute configuration of alcohols, amines and carboxylic acids by NMR. Tetrahedron Asymmetry 12:2915–2925

Seco JM, Quiñoá E, Riguera R (2004) The assignment of absolute configuration by NMR. Chem Rev 104:17–117

Oku N, Takada K, Fuller RW, Wilson JA, Peach ML, Pannell LK, McMahon JB, Gustafson KR (2010) Isolation, structural elucidation, and absolute stereochemistry of enigmazole A, a cytotoxic phosphomacrolide from the Papua New Guinea marine sponge Cinachyrella enigmatica . J Am Chem Soc 132:10278–10285

Article   PubMed   CAS   Google Scholar  

Seco JM, Latypov S, Quiñoá E, Riguera R (1994) New chirality recognizing reagents for the determination of absolute stereochemistry and enantiomeric purity by NMR. Tetrahedron Lett 35:2921–2924

Kusumi T, Takahashi H, Xu P, Fukushima T, Asakawa Y, Hashimoto T, Kan Y, Inouye Y (1994) New chiral anisotropic reagents, NMR tools to elucidate the absolute configurations of long-chain organic compounds. Tetrahedron Lett 35:4397–4400

Williamson RT, Barrios Sosa AC, Mitra A, Seaton PJ, Weibel DB, Schroeder FC, Meinwald J, Koehn FE (2003) New silyl ether reagents for the absolute stereochemical determination of secondary alcohols. Org Lett 5:1745–1748

Nagai Y, Kusumi T (1995) New chiral anisotropic reagents for determining the absolute configuration of carboxylic acids. Tetrahedron Lett 36:1853–1856

Ferreiro MJ, Latypov SK, Quiñoá E, Riguera R (2000) Assignment of the absolute configuration of α–chiral carboxylic acids by 1 H NMR spectroscopy. J Org Chem 65:2658–2666

Matsumori N, Kaneno D, Murata M, Nakamura H, Tachibana K (1999) Stereochemical determination of acyclic structures based on carbon-proton spin-coupling constants. A method of configuration analysis for natural products. J Org Chem 64:866–876

Minch MJ (1994) Orientational dependence of vicinal proton-proton NMR coupling constants: the Karplus relationship. Concepts Magn Reson 6:41–56

Bifulco G, Dambruoso P, Gomez-Paloma L, Riccio R (2007) Determination of relative configuration in organic compounds by NMR spectroscopy and computational methods. Chem Rev 107:3744–3779

Nilewski C, Geisser RW, Ebert M-O, Carreira EM (2009) Conformational and configurational analysis in the study and synthesis of chlorinated natural products. J Am Chem Soc 131:15866–15876

Dalvit C, Bovermann G (1995) Pulsed field gradient one-dimensional NMR selective ROE and TOCSY experiments. Magn Reson Chem 33:156–159

Scott K, Keeler J, Van QN, Shaka AJ (1997) One-dimensional NOE experiments using pulsed field gradients. J Magn Reson 125:320–324

Griesinger C, Sørenson OW, Ernst RR (1986) Correlation of connected transitions by two-dimensional NMR spectroscopy. J Chem Phys 85:6837–6852

Márquez B, Gerwick WH, Williamson RT (2001) Survey of NMR experiments for the determination of nJ (C, H) heteronuclear coupling constants in small molecules. Magn Reson Chem 39:499–530

Article   Google Scholar  

Uhrín D, Batta G, Hruby VJ, Barlow PN, Kövér KE (1998) Sensitivity- and gradient-enhanced hetero (ω 1 ) half-filtered TOCSY experiment for measuring long-range heteronuclear coupling constants. J Magn Reson 130:155–161

Article   PubMed   Google Scholar  

Meissner A, Sørenson OW (2001) Measurement of J (HH) and long-range J (X, H) coupling constants in small molecules. Broadband XLOC and J-HMBC. Magn Reson Chem 39:49–52

Williamson RT, Márquez BL, Gerwick WH, Kövér KE (2000) One- and two-dimensional gradient-selective HSQMBC NMR experiments for the efficient analysis of long-range heteronuclear coupling constants. Magn Reson Chem 38:265–273

Bifulco G, Bassarello C, Riccio R, Gomez-Paloma L (2004) Quantum mechanical calculations of NMR J coupling values in the determination of relative configuration in organic compounds. Org Lett 6:1025–1028

Download references

Author information

Authors and affiliations.

Molecular Targets Laboratory, Center for Cancer Research, National Cancer Institute-Frederick, Frederick, MD, 21702, USA

Kirk R. Gustafson

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kirk R. Gustafson .

Editor information

Editors and affiliations.

Facoltà di Farmacia, Dipto. Chimica delle Sostanze Naturali, Università di Napoli, Via Domenico Montesano 49, Napoli, 80131, Italy

Ernesto Fattorusso

Scripps Inst. Oceanography, University of California, San Diego, Gilman Drive 9500, La Jolla, 92093-0213, California, USA

William H. Gerwick

Orazio Taglialatela-Scafati

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media B.V.

About this entry

Cite this entry.

Gustafson, K.R. (2012). NMR Methods for Stereochemical Assignments. In: Fattorusso, E., Gerwick, W., Taglialatela-Scafati, O. (eds) Handbook of Marine Natural Products. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3834-0_9

Download citation

DOI : https://doi.org/10.1007/978-90-481-3834-0_9

Published : 08 March 2012

Publisher Name : Springer, Dordrecht

Print ISBN : 978-90-481-3833-3

Online ISBN : 978-90-481-3834-0

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

CCPN

Basics | Solution NMR | Solid-state NMR | Structure | NMR Exchange Format | Screening | Macro Writing | Old Version 3.0 Tutorials

Screen capture for the beginner tutorial

Beginner Tutorial

Find out how to load and display spectra, change contour levels, zoom, peak pick, work with strips and print spectra.

HSQC spectrum

Introduction to NMR Tutorial

Learn how to open and investigate some spectra and make peak assignments from an imported Chemical Shift List.

CcpNmr Analysis Desktop with BMRB logo

BMRB / NMR-STAR Import

Learn how to download an NMR-STAR file from the BMRB, import it into Analysis and create a simulated peak list from the chemical shifts.

Solution NMR

Screen capture of Backbone Assignment Tutorial

Backbone Assignment Tutorial

Learn how to do the semi-automated backbone assignment of a protein using our dedicated tools.

Assigned CCcoNH strip and 2D 1H-13C HSQC contour plot

Side-chain Assignment Tutorial

Find out how you can use a variety of features to help you assign the aliphatic side chains of a protein.

Screen capture from the Chemical Shift Perturbation Analysis Tutorial

Chemical Shift Perturbation Analysis Tutorial

Learn how to analyse titration data, plot chemical shift changes on a histogram, obtain binding curves and Kd values, and show your results on a structure in PyMOL. NEW & UPDATED – April 2023!

T2 Data table with fitting curve and per residue bar chart

Dynamics Tutorial

This tutorial introduces our new beta-version Relaxation module with which you can analyse T1, T2 and Heteronuclear NOE data and do reduced spectral density mapping.

Solid-state NMR

Screen capture from the Peptide Assignment Solid-state NMR Tutorial

Sup35 Peptide Assignment Tutorial

Find out how to use Analysis to assign a small uniformly labelled peptide using 13 C-detected spectra.

Screen capture from the HETs Assignment Solid-state NMR Tutorial

HETs Assignment Tutorial

Use Analysis to assign uniformly labelled HETs218 using 13 C-detected spectra.

Screen capture from the Solid-state NMR Tutorial

SH3 Assignment NMR Tutorial

Learn how to AnalysisAssign to make manual assignments in solid-state NMR spectra, and assign a small protein using [1,3]- 13 C and [2] -13 C labelled glycerol samples.

Screenshots of CcpNmrAnalysis and PyMol

Structure Tutorial

Learn how to export structure calculation data to XPLOR and then import and analyse the results. You can follow this tutorial even if you don’t have XPLOR installed on your computer.

NMR Exchange Format (NEF)

Screenshot of Import Nef Dialog box

How To Import/Export NEF files

Learn how to use NEF to move data between CCPN projects.

Stacked and labelled reference spectra

How To Import Reference Mixtures from NEF

Find out how to import Reference Mixture data stored in a NEF file, ready to start a analysing a new set of screening data.

Chemical Shift List section of a NEF file

How To Create a NEF file from Tabular Data

Instructions for how to create a NEF file from old tabular data in order to import it into CcpNmr Analysis or other programs.

A Swiss Army Knife branded with the NEF Pipelines logo

How To Manipulate Data with NEF Pipelines

Learn how to export data to MARS or iPine for automatic assignment using the NEF Pipelines software written by Gary Thompson.

nmr assignment pdf

NEF Workshop

Watch a recording of the demos from our online NEF Workshop in November 2022.

Screen Capture of Tutorial Data

Screen Data

Download these data and projects for use with our Screening Tutorials and How-Tos.

Screen capture of Hit Analysis Graphing module

Hit Analysis Tutorial

Learn how to use AnalysisScreen, including the use of SpectrumGroups, Pipelines, Hit Analysis, data import from Excel and NEF and creation of NEF Reference Mixture Files.

Screen capture from AnalysisScreen Mixtures Tutorial

Mixtures Tutorial

Find out how to generate, analyse and edit compound mixtures for your screening library.

Macro Writing

Screen capture from the Macro Writing Tutorial

Macro Writing Tutorial

Learn how easily to write Python macros in CcpNmr Analysis Version 3, so you can create shortcuts and manipulate and analyse your data in bespoke ways.

A dialog box for a routine which will filter peak lists

GUI Macro Writing Tutorial

Find out how to make your macros more user friendly by adding a Graphical User Interface (GUI) in the form of a pop-up dialog box.

Cookie Notice

Privacy overview.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 October 2022

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

  • Piotr Klukowski   ORCID: orcid.org/0000-0003-1045-3487 1 ,
  • Roland Riek   ORCID: orcid.org/0000-0002-6333-066X 1 &
  • Peter Güntert   ORCID: orcid.org/0000-0002-2911-7574 1 , 2 , 3  

Nature Communications volume  13 , Article number:  6151 ( 2022 ) Cite this article

11k Accesses

27 Citations

30 Altmetric

Metrics details

  • Machine learning
  • Solution-state NMR

Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. We present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.

Similar content being viewed by others

nmr assignment pdf

The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

Piotr Klukowski, Fred F. Damberger, … Peter Güntert

nmr assignment pdf

DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra

Da-Wei Li, Alexandar L. Hansen, … Rafael Brüschweiler

nmr assignment pdf

Automatic structure-based NMR methyl resonance assignment in large proteins

Iva Pritišanac, Julia M. Würz, … Peter Güntert

Introduction

Studying structures of proteins and ligand-protein complexes is one of the most influential endeavors in molecular biology and rational drug design. All key structure determination techniques, X-ray crystallography, electron microscopy, and NMR spectroscopy, have led to remarkable discoveries, but suffer from their respective experimental limitations. NMR can elucidate structures and dynamics of small and medium size proteins in solution 1 and even in living cells 2 . However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist. Attributed to this, the percentage of NMR protein structures in the Protein Data Bank (PDB) has decreased from a maximum of 14.6% in 2007 to 7.3% in 2021 ( https://www.rcsb.org/stats ). The problem has sparked research towards automating different tasks in NMR structure determination 3 , 4 , including peak picking 5 , 6 , 7 , 8 , 9 , resonance assignment 10 , 11 , 12 , and the identification of distance restraints 13 , 14 . Several of these methods are available as webservers 15 , 16 . This enabled semi-automatic 17 , 18 but not yet unsupervised automation of the entire NMR structure determination process, except for a very small number of favorable proteins 7 , 19 .

The advance of machine learning techniques 20 now offers unprecedented possibilities for reliably replacing decisions of human experts by efficient computational tools. Here, we present a method that achieves this goal for NMR assignment and structure determination. We show for a diverse set of 100 proteins that NMR resonance assignments and protein structures can be determined within hours after completing the NMR measurements. Our method, Art ificial I ntelligence for N MR A pplications, ARTINA (Fig.  1 ), combines machine learning for tasks that are difficult to model otherwise with existing algorithms—evolutionary optimization for resonance assignment with FLYA 12 , chemical shift database searches for torsion angle restraint generation with TALOS-N 21 , ambiguous distance restraints, network-anchoring and constraint combination for NOESY assignment 14 , 22 and simulated annealing by torsion angle dynamics for structure calculation with CYANA 23 . Machine learning is used in multiple flavors—deep residual neural networks 24 for visual spectrum analysis to identify peak positions (pp-ResNet) and to deconvolve overlapping signals (deconv-ResNet) in 25 different types of spectra (Supplementary Table  1 ), kernel density estimation (KDE) to reconstruct original peak positions in folded spectra, a deep graph neural network 25 , 26 (GNN) for chemical shift estimation within the refinement of chemical shift assignments, and a gradient boosted trees 27 (GBT) model for the selection of structure proposals.

figure 1

The flowchart presents the interplay between the main components of the automated protein structure determination workflow: Residual Neural Network (ResNet), FLYA automated chemical shift assignment, Graph Neural Network (GNN), Gradient Boosted Trees (GBT), and CYANA structure calculation.

A major challenge in developing ARTINA was the collection and preparation of a large training data set that is required for machine learning, because, in contrast to assignments and structures, NMR spectra are generally not archived in public data repositories. Instead, we were obliged to collect from different sources and standardize complete sets of multidimensional NMR spectra for the assignment and structure determination of 100 proteins.

In the following work, we describe the algorithm, training and test data, and results of ARTINA automated structure determination, which are on par with those achieved in weeks or months of human experts’ labor.

Benchmark dataset

One of the major obstacles for developing deep learning solutions for protein NMR spectroscopy is the lack of a large-scale standardized benchmark dataset of protein NMR spectra. To date, published manuscripts presenting the most notable methods for computational NMR, typically refer to less than 50 2D/3D/4D NMR spectra in their experimental sections. Even the well-recognized CASD-NMR competition cannot serve as a major source of training data for deep learning, since only the NOESY spectra of 10 proteins were used in the last round of the event 28 .

To make our study possible, we established a standardized benchmark of 1329 2D/3D/4D NMR spectra, which allows 100 proteins to be recalculated using their original spectral data (Fig.  2 and Supplementary Table  2 ). Each protein record in our dataset contains 5–20 spectra together with manually identified chemical shifts (usually depositions at the Biological Magnetic Resonance Data Bank, BMRB) and the previously determined (“ground truth”) protein structure (PDB record; Supplementary Table  3 ). The benchmark covers protein sizes typically studied by NMR spectroscopy with sequence lengths between 35 and 175 residues (molecular mass 4–20 kDa).

figure 2

PDB codes (or names, MH04, MDM2, KRAS4B, if PDB code unavailable) of the 100 benchmark proteins are ordered by the number of residues. The histogram shows the number of spectra for backbone assignment, side-chain assignment, and NOE measurement. Spectrum types in each data set are shown by light to dark blue circles indicating the number of individual spectra of the given type. The percentages of benchmark records that contain a given spectrum type are given at the top. Spectrum types present in less than 5% of the data sets have been omitted.

Automated protein structure determination

The accuracy of protein structure determination with ARTINA was evaluated in a 5-fold cross-validation experiment with the aforementioned benchmark dataset. Five instances of pp-ResNet and GBT were trained, each one using data from about 80% of the proteins for training and the remaining ones for testing. Since each protein was present exactly once in the test set, reported quality metrics were obtained directly in the cross-validation experiment, and no averaging between data splits was required. To deploy pp-ResNet and GBT models in our online system, we constructed an ensemble by averaging predictions of all 5 cross-validation models. The other models were trained only once using either generated data (deconv-ResNet, Supplementary Fig.  1 ) or BMRB depositions excluding all benchmark proteins (GNN, KDE).

In this experiment, we reproduced 100 structures in fully automated manner using only NMR spectra and the protein sequences as input. Since ARTINA has no tunable parameters and does not require any manual curation of data, each structure was calculated by a single execution of the ARTINA workflow. All benchmark datasets were analyzed by ARTINA in parallel with execution times of 4–20 h per protein.

All automatically determined structures, overlaid with the corresponding reference structures from the PDB, are visualized in Fig.  3 , Supplementary Fig.  2 , and Supplementary Movie  1 . ARTINA was able to reproduce the reference structures with a median backbone root-mean-square deviation (RMSD) of 1.44 Å between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges determined by CYRANGE 29 (Fig.  4a and Supplementary Table  4 ). ARTINA automatically identified between 459 and 4678 distance restraints (2198 on average over 100 proteins), which corresponds to 4.25–33.20 restraints per residue (Fig.  4b ). This number is mainly influenced by the extent of unstructured regions and the quality of the NOESY spectra. In agreement with earlier findings 30 , it correlates only weakly with the backbone RMSD to reference (linear correlation coefficient −0.38). As a more expressive validation measure for the structures from ARTINA, we computed a predicted RMSD to the PDB reference structure on the basis of the RMSDs between the 10 candidate structure bundles calculated in ARTINA (see “Methods”, Fig.  5 , and Supplementary Table  5 ). The average deviation between actual and predicted RMSDs for the 100 proteins in this study is 0.35 Å, and their linear correlation coefficient is 0.77 (Fig.  5 ). In no case, the true RMSD exceeds the predicted one by more than 1 Å.

figure 3

The structures are aligned with the RMSD to reference range as indicated on the left and hexagonal frames color-coded by their size as indicated above. Structures with no corresponding PDB depositions are marked by an asterisk.

figure 4

a Backbone RMSD to reference. b Number of distance restraints per residue. c Chemical shift assignment accuracy. Bars represent quantity values for benchmark proteins, identified by PDB codes (or protein names). Proteins are ordered by size, which is indicated by a color-coded circle. Values in the center of each panel are 10th, 50th, and 90th percentiles of values presented in the bar plot. Short/medium/long-range restraints are between residues i and j with | i – j | ≤ 1, 2 ≤ | i – j | ≤  4, and | i – j | ≥ 5, respectively.

figure 5

The predicted RMSD to reference (pRMSD) is calculated from the ARTINA results without knowledge of the reference PDB structure (see “Methods”) and, by definition, always in the range of 0–4 Å. For comparability, actual RMSD values to reference are also truncated at 4 Å (protein 2M47 with RMSD 4.47 Å). The dotted lines represent deviations of ±1 Å between the two RMSD quantities.

Additional structure validation scores obtained from ANSSUR 31 (Supplementary Table  6 ), RPF 32 (Supplementary Table  7 ), and consensus structure bundles 33 (Supplementary Table  8 ) confirm that overall the ARTINA structures and the corresponding reference PDB structures are of equivalent quality. Energy refinement of the ARTINA structures in explicit water using OPALp 34 (not part of the standard ARTINA workflow) does not significantly alter the agreement with the PDB reference structures (Supplementary Table  9 ). The benchmark data set comprises 78 protein structures determined by the Northeast Structural Genomics Consortium (NESG). On average, ARTINA yielded structures of the same accuracy for NESG targets (median RMSD to reference 1.44 Å) as for proteins from other sources (1.42 Å).

On average, ARTINA correctly assigned 90.39% of the chemical shifts (Fig.  4c ), as compared to the manually prepared assignments, including both “strong” (high-reliability) and “weak” (tentative) FLYA assignments 12 . Backbone chemical shifts were assigned more accurately (96.03%) than side-chain ones (86.50%), which is mainly due to difficulties in assigning lysine/arginine (79.97%) and aromatic (76.87%) side-chains. Further details on the assignment accuracy for individual amino acid types in the protein cores (residues with less than 20% solvent accessibility) are given in Supplementary Table  10 . Assignments for core residues, which are important for the protein structure, are generally more accurate than for the entire protein, in particular for core Ala, Cys, and Asp residues, which show a median assignment accuracy of 100% over the 100 proteins. The lowest accuracies are observed for core His (83.3%), Phe (83.3%), and Arg (87.5%) residues. The three proteins with highest RMSD to reference, 2KCD, 2L82, and 2M47 (see below), show 68.2, 83.8, and 75.7% correct aromatic assignments, respectively, well below the corresponding median of 85.5%. On the other hand, the assignment accuracies for the methyl-containing residues Ala, Ile, Val are above average and reach a median of 100, 97.6, and 98.6%, respectively.

The quality of automated structure determination and chemical shift assignment reflects the performance of deep learning-based visual spectrum analysis, presented qualitatively in Figs.  6 – 7 , Supplementary Fig.  3 , and Supplementary Movies  2 – 4 . In this experiment, our models (pp-ResNet, deconv-ResNet) automatically identified 1,168,739 cross-peaks with high confidence (≥0.50) in the benchmark spectra. All 1329 peak lists, together with automatically determined protein structures and chemical shift lists, are available for download.

figure 6

A fragment of a 15 N-HSQC spectrum of the protein 1T0Y is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses). a 1 , a 2 Initial peak picking marker position is refined by the deconvolution model. b 1 , b 2 pp-ResNet output is deconvolved into two components. c The deconvolution model supports maximally 3 components per initial signal. d Two peak picking markers are merged by the deconvolution model. e Peak picking output deconvolved into three components.

figure 7

A fragment of the 13 C-HSQC spectrum of protein 2K0M is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses).

Error analysis

The largest deviations from the PDB reference structure were observed for the proteins 2KCD, 2L82, and 2M47, for which the pRMSD consistently indicated low accuracy (Fig.  5 ). Significant deviations are mainly due to displacements of terminal secondary structure elements (e.g., a tilted α-helix near a chain terminus), or inaccurate loop conformations (e.g., more flexible than in the PDB deposition). We investigated the origin of these discrepancies.

2KCD is a 120-residue (14.4 kDa) protein from Staphylococcus saprophyticus with an α-β roll architecture. Its dataset comprises 19 spectra (8 backbone, 6 side-chain, and 5 NOESY). The ARTINA structure has a backbone RMSD to PDB reference of 3.13 Å, which is caused by the displacement of the C-terminal α-helix (residues 105–109; Supplementary Fig.  4a ). Excluding this 5-residue fragment decreases the RMSD to 2.40 Å (Supplementary Table  11 ). The positioning of this helix appears to be uncertain, since an ARTINA calculation without the 4D CC-NOESY spectrum yields a significantly lower RMSD of 1.77 Å (Supplementary Table  12 ).

2L82 is a de novo designed protein of 162 residues (19.7 kDa) with an αβ 3-layer (αβα) sandwich architecture. Although only 9 spectra (4 backbone, 2 side-chain and 3 NOESY) are available, ARTINA correctly assigned 97.87% backbone and 81.05% side-chain chemical shifts. The primary reason for the high RMSD value of 3.55 Å is again a displacement of the C-terminal α-helix (residues 138–153). The remainder of the protein matches closely the PDB deposition (1.04 Å RMSD, Supplementary Fig.  4b ).

The protein with highest RMSD to reference (4.72 Å) in our benchmark dataset is 2M47, a 163-residue (18.8 kDa) protein from Corynebacterium glutamicum with an α-β 2-layer sandwich architecture, for which 17 spectra (7 backbone, 7 side chain and 3 NOESY) are available. The main source of discrepancy are two α-helices spanning residues 111–157 near the C-terminus. Nevertheless, the residues contributing to the high RMSD value are distributed more extensively than in 2L82 and 2KCD just discussed. Interestingly, 2 of the 10 structure proposals calculated by ARTINA have an RMSD to reference below 2 Å (1.66 Å and 1.97 Å). In the final structure selection step, our GBT model selected the 4.72 Å RMSD structure as the first choice and 1.66 Å as the second one (Supplementary Fig.  4c ). Such results imply that the automated structure determination of this protein is unstable. Since ARTINA returns the two structures selected by GBT with the highest confidence, the user can, in principle, choose the better structure based on contextual information.

In addition to these three case studies, we performed a quantitative analysis of all regular secondary structure elements and flexible loops present in our 100-protein benchmark in order to assess their impact on the backbone RMSD to reference (Supplementary Table  11 ). All residues in the structurally well-defined regions determined by CYRANGE 29 were assigned to 6 partially overlapping sets: (a) first secondary structure element, (b) last secondary structure element, (c) α-helices, (d) β-sheets, (e) α-helices and β-sheets, and (f) loops. Then, the RMSD to reference was calculated 6 times, each time with one set excluded. In total, for 66 of the 100 proteins the lowest RMSD was obtained if set (f) was excluded from RMSD calculation, and 13% benefited most from removal of the first or last secondary structure element (a or b). Moreover, for 18 out of the 19 proteins with more than 0.5 Å RMSD decrease compared to the RMSD for all well-defined residues, (a), (b), or (f) was the primary source of discrepancy. These results are consistent with our earlier statement that deviations in automatically determined protein structures are mainly caused by terminal secondary structure elements or inaccurate loop conformations.

Ablation studies

During the experiment, we captured the state of each structure determination at 9 time-points, 3 per structure determination cycle: (a) after the initial FLYA shift assignment, (b) after GNN shift refinement, and (c) after structure calculation (Fig.  1 ). Comparative analysis of these states allowed us to quantify the contribution of different ARTINA components to the structure determination process (Table  1 ).

The results show a strong benefit of the refinement cycles, as quantities reported in Table  1 consistently improve from cycle 1 to 3. The majority of benchmark proteins converge to the correct fold after the first cycle (1.56 Å median backbone RMSD to reference), which is further refined to 1.52 Å in cycle 2 and 1.44 Å in cycle 3. Additionally, within each chemical shift refinement cycle, improvements in assignment accuracy resulting from the GNN predictions are observed. This quantity also increases consistently across all refinement cycles, in particular for side-chains. Refinement cycles are particularly advantageous for large and challenging systems, such as 2LF2, 2M7U, or 2B3W, which benefit substantially in cycles 2 and 3 from the presence of the approximate protein fold in the chemical shift assignment step.

Impact of 4D NOESY experiments

As presented in Fig.  2 , 26 out of 100 benchmark datasets contain 4D CC-NOESY spectra, which require long measurement times and were used in the manual structure determination. To quantify their impact, we performed automated structure determinations of these 26 proteins with and without the 4D CC-NOESY spectra (Supplementary Table  12 ).

On average, the presence of 4D CC-NOESY improves the backbone RMSD to reference by 0.15 Å (decrease from 1.88 to 1.73 Å) and has less than 1% impact on chemical shift assignment accuracy. However, the impact is non-uniform. For three proteins, 2KIW, 2L8V, and 2LF2, use of the 4D CC-NOESY decreased the RMSD by more than 1 Å. On the other hand, there is also one protein, 2KCD, for which the RMSD decreased by more than 1 Å by excluding the 4D CC-NOESY.

These results suggest that overall the amount of information stored in 2D/3D experiments is sufficient for ARTINA to reach close to optimal performance, and only modest improvement can be achieved by introducing additional information redundancy from 4D CC-NOESY spectra.

Automated chemical shift assignment

Apart from structure determination, our data analysis pipeline for protein NMR spectroscopy can address an array of problems that are nowadays approached manually or semi-manually. For instance, ARTINA can be stopped after visual spectrum analysis, returning positions and intensities of cross-peaks that can be utilized for any downstream task, not necessarily related to protein structure determination.

Alternatively, a single chemical shift refinement cycle can be performed to get automatically assigned cross-peaks from spectra and sequence. We evaluated this approach with three sets of spectra: (i) Exclusively backbone assignment spectra were used to assign N, C α , C β , C’, and H N shifts. With this input, ARTINA assigned 92.40% (median value) of the backbone shifts correctly. (ii) All through-bond but no NOESY spectra were used to assign the backbone and side-chain shifts. This raised the percentage of correct backbone assignments to 94.20%. (iii) The full data set including NOESY yielded 96.60% correct assignments of the backbone shifts. These three experiments were performed for the 45 benchmark proteins, for which CBCANH and CBCAcoNH, as well as either HNCA and HNcoCA or HNCO and HNcaCO experiments were available. The availability of NOESY spectra had a large impact on the side-chain assignments: 86.00% were correct for the full spectra set iii, compared to 73.70% in the absence of NOESY spectra (spectra set ii). The presence of NOESY spectra consistently improved the chemical shift assignment accuracy of all amino acid types (Supplementary Tables  13 and 14 ). The improvement is particularly strong for aromatic residues (Phe, 61.6 to 76.5%, Trp 52.5 to 80%, and Tyr 71.4 to 89.7%), but not limited to this group.

The results obtained with ARTINA differ in several aspects substantially from previous approaches towards automating protein NMR analysis 3 , 4 , 7 , 12 , 17 , 18 , 19 , 35 . First, ARTINA comprehends the entire workflow from spectra to structures rather than individual steps in it, and there are strictly no manual interventions or protein-specific parameters to be adapted. Second, the quality of the results regarding peak identification, resonance assignments, and structures have been assessed on a large and diverse set of 100 proteins; for the vast majority of which they are on par with what can be achieved by human experts. Third, the method provides a two-orders-of-magnitude leap in efficiency by providing assignments and a structure within hours of computation time rather than weeks or months of human work. This reduces the effort for a protein structure determination by NMR essentially to the preparation of the sample and the measurement of the spectra. Its implementation in the https://nmrtist.org webserver (Supplementary Movie  5 ) encapsulates its complexity, eliminates any intermediate data and format conversions by the user, and enables the use of different types of high-performance hardware as appropriate for each of the subtasks. ARTINA is not limited to structure determination but can be used equally well for peak picking and resonance assignment in NMR studies that do not aim at a structure, such as investigations of ligand binding or dynamics.

Although ARTINA has no parameters to be optimized by the user, care should be given to the preparation of the input data, i.e., the choice, measurement, processing, and specification of the spectra. Spectrum type, axes, and isotope labeling declarations must be correct, and chemical shift referencing consistent over the entire set of spectra. Slight variations of corresponding chemical shifts within the tolerances of 0.03 ppm for 1 H and 0.4 ppm for 13 C/ 15 N can be accommodated, but larger deviations, resulting, for instance, from the use of multiple samples, pH changes, protein degradation, or inaccurate referencing, can be detrimental. Where appropriate, ARTINA proposes corrections of chemical shift referencing 36 . Furthermore, based on the large training data set, which comprises a large variety of spectral artifacts, ARTINA largely avoids misinterpreting artifacts as signals. However, with decreasing spectral quality, ARTINA, like a human expert, will progressively miss real signals.

Regarding protein size and spectrum quality, limitations of ARTINA are similar to those encountered by a trained spectroscopist. Machine-learning-based visual analysis of spectra requires signals to be present and distinguishable in the spectra. ARTINA does not suffer from accidental oversight that may affect human spectra analysis. On the other hand, human experts may exploit contextual information to which the automated system currently has no access because it identifies individual signals by looking at relatively small, local excerpts of spectra.

In this paper, we used all spectra that are available from the earlier manual structure determination. For most of the 100 proteins, the spectra data set has significant redundancy regarding information for the resonance assignment. Our results indicate that one can expect to obtain good assignments and structures also from smaller sets of spectra 37 , with concomitant savings of NMR measurement time. We plan to investigate this in a future study.

The present version of ARTINA can be enhanced in several directions. Besides improving individual models and algorithms, it is conceivable to integrate the so far independently trained collection of machine learning models, plus additional models that replace conventional algorithms, into a coherent system that is trained as a whole. Furthermore, the reliability of machine learning approaches depends strongly on the quantity and quality of training data available. While the collection of the present training data set for ARTINA was cumbersome, from now on it can be expected to expand continuously through the use of the https://nmrtist.org website, both quantitatively and qualitatively with regard to greater variability in terms of protein types. spectral quality, source laboratory, data processing (including non-linear sampling), etc., which can be exploited in retraining the models. ARTINA can also be extended to use additional experimental input data, e.g., known partial assignments, stereospecific assignments, 3 J couplings, residual dipolar couplings, paramagnetic data, and H-bonds. Structural information, e.g., from AlphaFold 38 , can be used in combination with reduced sets of NMR spectra for rapid structure-based assignment. Finally, the range of application of ARTINA can be generalized to small molecule-protein complexes relevant for structure-activity relationship studies in drug research, protein-protein complexes, RNA, solid state, and in-cell NMR.

Overall, ARTINA stands for a paradigm change in biomolecular NMR from a time-consuming technique for specialists to a fast method open to researchers in molecular biology and medicinal chemistry. At the same time, in a larger perspective, the appearance of generally highly accurate structure predictions by AlphaFold 38 is revolutionizing structural biology. Nevertheless, there remains space for the experimental methods, for instance, to elucidate various states of proteins under different conditions or in dynamic exchange, or for studying protein-ligand interaction. Regarding ARTINA, one should keep in mind that its applications extend far beyond structure determination. It will accelerate virtually any biological NMR studies that require the analysis of multidimensional NMR spectra and chemical shift assignments. Protein structure determination is just one possible ARTINA application, which is both demanding in terms of the amount and quality of required experimental data and amenable to quantitative evaluation.

Spectrum benchmark collection

To collect the benchmark of NMR spectra (Fig.  2 and Supplementary Table  2 ), we implemented a crawler software, which systematically scanned the FTP server of the BMRB data bank 39 , identifying data files relevant to our study. Additional datasets were obtained by setting up a website for the deposition of published data ( https://nmrdb.ethz.ch ), from our collaboration network, or had been acquired internally in our laboratory. NMR data was collected from these channels either in the form of processed spectra (Sparky 40 , NMRpipe 41 , XEASY 42 , Bruker formats), or in the form of time-domain data accompanied by depositor-supplied NMRpipe processing scripts. No additional spectra processing (e.g., baseline correction) was performed as part of this study.

The most challenging aspects of the benchmark collection process were: scarcity of data—only a small fraction of all BMRB depositions are accompanied by uploaded spectra (or time-domain data), lack of standards for NMR data depositions—each protein data set had to be prepared manually, as the original data was stored in different formats (spectra name conventions, axis label standards, spectra data format), and difficulties in correlating data files deposited in the BMRB FTP site with contextual information about the spectrum and the sample (e.g., sample characteristics, measurement conditions, instrument used). Manually prepared (mostly NOESY) peak lists, which are available from the BMRB for some of the proteins in the benchmark, were not used for this study.

Different approaches to 3D 13 C-NOESY spectra measurement had to be taken into account: (i) Two separate 13 C NOESY for aliphatic and aromatic signals. These were analyzed by ARTINA without any special treatment. We used ALI , ARO tags (Supplementary Movie  S5 ) to provide the information that only either aliphatic or aromatics shifts are expected in a given spectrum. (ii) Simultaneous NC-NOESY. These spectra were processed twice to have proper scaling of the 13 C and 15 N axes in ppm units, and cropped to extract 15 N-NOESY and 13 C-NOESY spectra. If nitrogen and carbon cross-peak amplitudes have different signs, we used POS , NEG tags to provide the information that only either positive or negative signals should be analyzed. (iii) Aliphatic and aromatic signals in a single 13 C-NOESY spectrum. These measurements do not require any special treatment, but proper cross-peak unfolding plays a vital role in aromatic signals analysis.

Overview of the ARTINA algorithm

ARTINA uses as input only the protein sequence and a set of NMR spectra, which may contain any combination of 25 experiments currently supported by the method (Supplementary Table  1 ). Within 4–20 h of computation time (depending on protein size, number of spectra, and computing hardware load), ARTINA determines: (a) cross-peak positions for each spectrum, (b) chemical shift assignments, (c) distance restraints from NOESY spectra, and (d) the protein structure. The whole process does not require any human involvement, allowing rapid protein NMR assignment and structure determination by non-experts.

The ARTINA workflow starts with visual spectrum analysis (Fig.  1 ), wherein cross-peak positions are identified in frequency-domain NMR spectra using deep residual neural networks (ResNet) 24 . Coordinates of signals in the spectra are passed as input to the FLYA automated assignment algorithm 12 , yielding initial chemical shift assignments . In the subsequent chemical shift refinement step, we bring to the workflow contextual information about thousands of protein structures solved by NMR in the past using a deep GNN 25 that was trained on BMRB/PDB depositions. Its goal is to predict expected values of yet missing chemical shifts, given the shifts that have already been confidently and unambiguously assigned by FLYA. With these GNN predictions as additional input, the cross-peak positions are reassessed in a second FLYA call, which completes the chemical shift refinement cycle (Fig.  1 ).

In the structure refinement cycle , 10 variants of NOESY peak lists are generated, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis by varying the confidence threshold of a signal selected by ResNet between 0.05 and 0.5. Each set of NOESY peak lists is used in an independent CYANA structure calculation 22 , 23 , yielding 10 intermediate structure proposals (Fig.  1 ). The structure proposals are ranked in the intermediate structure selection step based on 96 features with a dedicated GBT model. The selected best structure proposal is used as contextual information in a consecutive FLYA run, which closes the structure refinement cycle .

After the two initial steps of visual spectrum analysis and initial chemical shift assignment, ARTINA interchangeably executes refinement cycles. The chemical shift refinement cycle provides FLYA with tighter restraints on expected chemical shifts, which helps to assign ambiguous cross-peaks. The structure refinement cycle provides information about possible through-space contacts, allowing identified cross-peaks (especially in NOESY) to be reassigned. The high-level concept behind the interchangeable execution of refinement cycles is to iteratively update the protein structure given fixed chemical shifts, and update chemical shifts given the fixed protein structure. Both refinement cycles are executed three times.

Automated visual analysis of the spectrum

We established two machine learning models for the visual analysis of multidimensional NMR spectra (see downloads in the Code availability section). In their design, we made no assumptions about the downstream task and the 2D/3D/4D experiment type. Therefore, the proposed models can be used as the starting point of our automated structure determination procedure, as well as for any other task that requires cross-peak coordinates.

The automated visual analysis starts by selecting all extrema \({{{{{\boldsymbol{x}}}}}}=\left\{{{{{{{\boldsymbol{x}}}}}}}_{1},{{{{{{\boldsymbol{x}}}}}}}_{2},\ldots,{{{{{{\boldsymbol{x}}}}}}}_{N}\right\}\) , \({{{{{{\boldsymbol{x}}}}}}}_{n}\in {{\mathbb{N}}}^{D}\) in the NMR spectrum, which is represented as a D -dimensional regular grid storing signal intensities at discrete frequencies. We formulated the peak picking task as an object detection problem, where possible object positions are confined to \({{{{{\boldsymbol{x}}}}}}\) . This task was addressed by training a deep residual neural network 24 , in the following denoted as peak picking ResNet (pp-ResNet), which learns a mapping \({{{{{{\boldsymbol{x}}}}}}}_{n}\to[0,\,1]\) that assigns to each signal extremum a real-valued score, which resembles its probability of being a true signal rather than an artefact.

Our network architecture is strongly linked to ResNet-18 24 . It contains 8 residual blocks, followed by a single fully connected layer with sigmoidal activation. After weight initialization with Glorot Uniform 43 , the architecture was trained by optimizing a binary cross-entropy loss using Adam 44 with learning rate 10 –4 and gradient clipping of 0.5.

To establish an experimental training dataset for pp-ResNet, we normalized the 1329 spectra in our benchmark with respect to resolution (adjusting the number of data grid points per unit chemical shift (ppm) using linear interpolation) and signal amplitude (scaling the spectrum by a constant). Subsequently, 675,423 diverse 2D fragments of size 256 × 32 × 1 were extracted from the normalized spectra and manually annotated, yielding 98,730 positive and 576,693 negative class training examples. During the training process, we additionally augmented this dataset by flipping spectrum fragments along the second dimension (32 pixels), stretching them by 0–30% in the first and second dimensions, and perturbing signal intensities with Gaussian noise addition.

The role of the pp-ResNet is to quickly iterate over signal extrema in the spectrum, filtering out artefacts and selecting approximate cross-peak positions for the downstream task. The relatively small network architecture (8 residual blocks) and input size of 2D 256 × 32 image patches make it possible to analyze large 3D 13 C-resolved NOESY spectra in less than 5 min on a high-end desktop computer. Simultaneously, the first dimension of the image patch (256 pixels) provides long-range contextual information on the possible presence of signals aligned with the current extremum (e.g., C α , C β cross-peaks in an HNCACB spectrum).

Extrema classified with high confidence as true signals by pp-ResNet undergo subsequent analysis with a second deep residual neural network (deconv-ResNet). Its objective is to perform signal deconvolution, based on a 3D spectrum fragment (64 × 32 × 5 voxels) that is cropped around a signal extremum selected by pp-ResNet. This task is defined as a regression problem, where deconv-ResNet outputs a 3 × 3 matrix storing 3D coordinates of up to 3 deconvolved peak components, relative to the center of the input image. To ensure permutation invariance with respect to the ordering of components in the output coordinate matrix, and to allow for a variable number of 1–3 peak components, the architecture was trained with a Chamfer distance loss 45 .

Since deconv-ResNet deals only with true signals and their local neighborhood, its training dataset can be conveniently generated. We established a spectrum fragment generator, based on rules reflecting the physics of NMR, which produced 110,000 synthetic training examples (Supplementary Fig.  1 ) having variable (a) numbers of components to deconvolve (1–3), (b) signal-to-noise ratio, (c) component shapes (Gaussian, Lorentzian, and mixed), (d) component amplitude ratios, (e) component separation, and (f) component neighborhood type (i.e., NOESY-like signal strips or HSQC-like 2D signal clusters). The deconv-ResNet model was thus trained on fully synthetic data.

Signal unaliasing

To use ResNet predictions in automated chemical shift assignment and structure calculation, detected cross-peak coordinates must be transformed from the spectrum coordinate system to their true resonance frequencies. We addressed the problem of automated signal unfolding with the classical machine learning approach to density estimation.

At first, we generated 10 5 cross-peaks associated with each experiment type supported by ARTINA (Supplementary Table  1 ). In this process, we used randomly selected chemical shift lists deposited in the BMRB database, excluding depositions associated with our benchmark proteins. Subsequently, we trained a Kernel Density Estimator (KDE):

which captures the distribution \({p}_{e}\left({{{{{\boldsymbol{x}}}}}}\right)\) of true peaks being present at position \({{{{{\boldsymbol{x}}}}}}\) in spectrum type \(e\) , based on N e = 10 5 cross-peaks coordinates \({{{{{{\boldsymbol{x}}}}}}}_{i}^{(e)}\) generated with BMRB data, and \(\kappa\) being the Gaussian kernel.

Unfolding a k -dimensional spectrum is defined as a discrete optimization problem, solved independently for each cross-peak \({{{{{{\boldsymbol{x}}}}}}}_{j}^{\left(e\right)}\) observed in a spectrum of type \(e\) :

where \({{{{{\boldsymbol{w}}}}}}\in{{\mathbb{R}}}^{k}\) is a vector storing the spectral widths in each dimension (ppm units), \({{\circ }}\)  is element-wise multiplication, \({{{{{\boldsymbol{s}}}}}}\in \,{{\mathbb{Z}}}^{k}\) is a vector indicating how many times the cross-peak is unfolded in each dimension, and \({{{{{{\boldsymbol{s}}}}}}}^{{{{{{\boldsymbol{*}}}}}}}\in {{\mathbb{Z}}}^{k}\) is the optimal cross-peak unfolding.

As long as regular and folded signals do not overlap or have different signs in the spectrum, KDE can unfold the peak list regardless of spectrum dimensionality. The spectrum must not be cropped in the folded dimension, i.e., the folding sweep width must equal the width of the spectrum in the corresponding dimension.

All 2D/3D spectra in our benchmark were folded in at most one dimension and satisfy the aforementioned requirements. However, the 4D CC-NOESY spectra satisfy neither, as regular and folded peaks both overlap and have the same signal amplitude sign. This introduces ambiguity in the spectrum unfolding that prevents direct use of the KDE technique. To retrieve original signal positions, 4D CC-NOESY cross-peaks were unfolded to overlap with signals detected in 3D 13 C-NOESY. In consequence, 4D CC-NOESY unfolding depended on other experiments, and individual 4D cross-peaks were retained only if they were confirmed in a 3D experiment.

Chemical shift assignment

Chemical shift assignment is performed with the existing FLYA algorithm 12 that uses a genetic algorithm combined with local optimization to find an optimal matching between expected and observed peaks. FLYA uses as input the protein sequence, lists of peak positions from the available spectra, chemical shift statistics, either from the BMRB 39 or the GNN described in the next section, and, if available, the structure from the previous refinement cycle. The tolerance for the matching of peak positions and chemical shifts was set to 0.03 ppm for 1 H, and 0.4 ppm for 13 C/ 15 N shifts. Each FLYA execution comprises 20 independent runs with identical input data that differ in the random numbers used in the optimization algorithm. Nuclei for which at least 80% of the 20 runs yield, within tolerance, the same chemical shift value are classified as reliably assigned 12 and used as input for the following chemical shift refinement step.

Chemical shift refinement

We used a graph data structure to combine FLYA-assigned shifts with information from previously assigned proteins (BMRB records) and possible spatial interactions. Each node corresponds to an atom in the protein sequence, and is represented by a feature vector composed of (a) a one-hot encoded atom type code (e.g., C α , H β ), (b) a one-hot encoded amino acid type, (c) the value of the chemical shift assigned by FLYA (only if a confident assignment is available, zero otherwise), (d) atom-specific BMRB shift statistics (mean and standard deviation), and (e) 30 chemical shift values obtained from BMRB database fragments. The latter feature is obtained by searching BMRB records for assigned 2–3-residue fragments that match the local protein sequence and have minimal mean-squared-error (MSE) to shifts confidently assigned by FLYA (non-zero values of feature (c) in the local neighborhood of the atom). The edges of the graph correspond to chemical bonds or skip connections. The latter connect the C β atom of a given residue with C β atoms 2, 3, and 5 residues apart in the amino acid sequence, and have the purpose to capture possible through-space influence on the chemical shift that is typically observed in secondary structure elements.

The chemical shift refinement task is defined as a node regression problem, where an expected value of the chemical shift is predicted for each atom that lacks a confident FLYA assignment. This task is addressed with a DeepGCN model 25 , 26 that was trained on 28,400 graphs extracted from 2840 referenced BMRB records 39 . Each training example was created by building a fully assigned graph out of a single BMRB record, and dropping chemical shift values (feature (c) above) for randomly chosen atoms that FLYA typically assigns either with low confidence or inaccurately.

Our DeepGCN model is designed specifically for de novo structure determination, as it uses only the protein sequence and partial shift assignments to estimate values of missing chemical shifts. Its predictions are used to guide the FLYA genetic algorithm optimization 12 by reducing its search range for assignments. The precise final chemical shift value is always determined by the position of a signal in the spectrum, rather than the model prediction alone.

Torsion angle restraints

Before each structure calculation step, torsion angle restraints for the ϕ and ψ angles of the polypeptide backbone were obtained from the current backbone chemical shifts using the program TALOS-N 21 . Restraints were only generated if TALOS-N classified the prediction as ‘Good’, ‘Strong’, or ‘Generous’. Given a TALOS-N torsion angle prediction of ϕ ± Δ ϕ , the allowed range of the torsion angle was set to ϕ ± max(Δ ϕ , 10°) for ‘Good’ and ‘Strong’ predictions, and ϕ ± 1.5 max(Δ ϕ , 10°) for ‘Generous’ predictions, and likewise for ψ .

Structure calculation and selection

Given the chemical shift assignments and NOESY cross-peak positions and intensities, the structure is calculated with CYANA 23 using the established method 22 that comprises 7 cycles of NOESY cross-peak assignment and structure calculation, followed by a final structure calculation. In total, 8 × 100 conformers are calculated for a given input data set using 30,000 torsion angle dynamics steps per conformer. The 20 conformers with the lowest final target function value are chosen to represent the solution structure proposal. The entire combined NOESY assignment and structure calculation procedure is executed independently 10 times based on 10 variants of NOESY peak lists, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis. The first set generously includes all signals selected by ResNet with confidence ≥0.05. The other variants of NOESY peak lists follow the same principle with increasingly restrictive confidence thresholds of 0.1, 0.15, …, 0.5.

The CYANA structures calculations are followed by a structure selection step, wherein the 10 intermediate structure proposals are compared pairwise by a Gradient Boosted Tree (GBT) model that uses 96 features from each structure proposal (including the CYANA target function value 23 , number of long-range distance restraints, etc.; for details, see downloads in the Code availability section) to rank the structures by their expected accuracy. The best structure from the ranking is subsequently used as contextual information for the chemical shift refinement cycle (Fig.  1 ), or returned as the final outcome of ARTINA. The second-best final structure is also returned for comparison.

To train GBT, we collected a set of successful and unsuccessful structure calculations with CYANA. Each training example was a tuple ( s i , r i ), where s i is the vector of features extracted from the CYANA structure calculation output, and r i is the RMSD of the output structure to the PDB reference. The GBT was trained to take the features s i and s j of two structure calculations with CYANA as input, and to predict a binary order variable o ij , such that o ij = 1 if r i  <  r j , and 0 otherwise. Importantly, the deposited PDB reference structures were not used directly in the GBT model training (they are used only to calculate the RMSDs). Consequently, the GBT model is unaffected by methodology and technicalities related to PDB deposition (e.g., the structure calculation software used to calculate the deposited reference structure).

Structure accuracy estimate

As an accuracy estimate for the final ARTINA structure, a predicted RMSD to reference (pRMSD) is calculated from the ARTINA results (without knowledge of the reference PDB structure). It aims at reproducing the actual RMSD to reference, which is the RMSD between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges as given in Supplementary Table  4 . The predicted RMSD is given by pRMSD = (1 – t ) × 4 Å, where, in analogy to the GDT_HA value 46 , t is the average fraction of the RMSDs ≤ 0.5, 1, 2, 4 Å between the mean coordinates of the best ARTINA candidate structure bundle and the mean coordinates of the structure bundles of the 9 other structure proposals. Since t ∈ [0, 1], the pRMSD is always in the range of 0–4 Å, grouping all “bad” structures with expected RMSD to reference ≥ 4 Å at pRMSD = 4 Å.

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

References structures: PDB Protein Data Bank ( https://www.rcsb.org/ ; accession codes in Fig.  2 and Supplementary Table  3 ).

Spectra and reference assignments: BMRB Biological Magnetic Resonance Data Bank ( https://bmrb.io/ ; entry IDs in Supplementary Table  3 ).

Peak lists, assignments, and structures: https://nmrtist.org/static/public/publications/artina/ARTINA_results.zip and in the ETH Research Collection under DOI 10.3929/ethz-b-000568621.

Source data for Figs.  2 , 4 , and 5 is available in Supplementary Tables  2 , 4 , and 5, respectively.

Code availability

The ARTINA algorithm is available as a webserver at https://nmrtist.org . pp-ResNet, deconv-ResNet, GNN, and GBT are available for download in binary form, together with architecture schemes, example input data, model input description, and source code that allows to read model files and make predictions ( https://github.com/PiotrKlukowski/ARTINA , https://nmrtist.org/static/public/publications/artina/models/ {ARTINA_peak_picking.zip, ARTINA_peak_deconvolution.zip, ARTINA_shift_prediction.zip, ARTINA_structure_ranking.zip}). These files provide a full technical specification of the components developed within ARTINA, and allow for their independent use in Python.

Existing software used: Python ( https://www.python.org/ ), CYANA ( https://www.las.jp/ ), TALOS-N ( https://spin.niddk.nih.gov/bax/software/TALOS-N ).

Wüthrich, K. NMR studies of structure and function of biological macromolecules (Nobel Lecture). Angew. Chem. Int. Ed. 42 , 3340–3363 (2003).

Article   CAS   Google Scholar  

Sakakibara, D. et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458 , 102–105 (2009).

Article   ADS   CAS   Google Scholar  

Guerry, P. & Herrmann, T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44 , 257–309 (2011).

Güntert, P. Automated structure determination from NMR spectra. Eur. Biophys. J. 38 , 129–143 (2009).

Garrett, D. S., Powers, R., Gronenborn, A. M. & Clore, G. M. A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 95 , 214–220 (1991).

ADS   CAS   Google Scholar  

Koradi, R., Billeter, M., Engeli, M., Güntert, P. & Wüthrich, K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson. 135 , 288–297 (1998).

Würz, J. M. & Güntert, P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J. Biomol. NMR 67 , 63–76 (2017).

Klukowski, P. et al. NMRNet: A deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34 , 2590–2597 (2018).

Li, D. W., Hansen, A. L., Yuan, C. H., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12 , 5229 (2021).

Bartels, C., Güntert, P., Billeter, M. & Wüthrich, K. GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18 , 139–149 (1997).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Linge, J. P., O’Donoghue, S. I. & Nilges, M. Automated assignment of ambiguous nuclear overhauser effects with ARIA. Methods Enzymol. 339 , 71–90 (2001).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319 , 209–227 (2002).

Allain, F., Mareuil, F., Ménager, H., Nilges, M. & Bardiaux, B. ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Res. 48 , W41–W47 (2020).

Lee, W. et al. I-PINE web server: Aan integrative probabilistic NMR assignment system for proteins. J. Biomol. NMR 73 , 213–222 (2019).

Huang, Y. P. J. et al. An integrated platform for automated analysis of protein NMR structures. Methods Enzymol. 394 , 111–141 (2005).

Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR 39 , 31–52 (2007).

López-Méndez, B. & Güntert, P. Automated protein structure determination from NMR spectra. J. Am. Chem. Soc. 128 , 13112–13122 (2006).

Murphy, K. P. Probabilistic Machine Learning: An Introduction (MIT Press, 2022).

Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56 , 227–241 (2013).

Güntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 453–471 (2015).

Güntert, P., Mumenthaler, C. & Wüthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273 , 283–298 (1997).

Article   Google Scholar  

Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://arxiv.org/abs/1609.02907 (2016).

Chiang, W. L. et al. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD) 257–266 (2019).

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proc. 32nd Conference on Neural Information Processing Systems (NIPS) (2018).

Rosato, A. et al. The second round of Critical Assessment of Automated Structure Determination of Proteins by NMR: CASD-NMR-2013. J. Biomol. NMR 62 , 413–424 (2015).

Kirchner, D. K. & Güntert, P. Objective identification of residue ranges for the superposition of protein structures. BMC Bioinform. 12 , 170 (2011).

Buchner, L. & Güntert, P. Systematic evaluation of combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 81–95 (2015).

Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun . 11 , 6321 (2020).

Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127 , 1665–1674 (2005).

Buchner, L. & Güntert, P. Increased reliability of nuclear magnetic resonance protein structures by consensus structure bundles. Structure 23 , 425–434 (2015).

Koradi, R., Billeter, M. & Güntert, P. Point-centered domain decomposition for parallel molecular dynamics simulation. Comput. Phys. Commun. 124 , 139–147 (2000).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR 24 , 171–189 (2002).

Buchner, L., Schmidt, E. & Güntert, P. Peakmatch: A simple and robust method for peak list matching. J. Biomol. NMR 55 , 267–277 (2013).

Scott, A., López-Méndez, B. & Güntert, P. Fully automated structure determinations of the Fes SH2 domain using different sets of NMR spectra. Magn. Reson. Chem. 44 , S83–S88 (2006).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Goddard, T. D. & Kneller, D. G. Sparky 3. (University of California, San Francisco, 2001).

Delaglio, F. et al. NMRPipe—A multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR 6 , 277–293 (1995).

Bartels, C., Xia, T. H., Billeter, M., Güntert, P. & Wüthrich, K. The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6 , 1–10 (1995).

Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. Proc. Mach. Learn. Res. 9 , 249–256 (2010).

Google Scholar  

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).

Davies, E. R. Computer Vision (Academic Press, 2018).

Kryshtafovych, A. et al. New tools and expanded data analysis capabilities at the protein structure prediction center. Proteins 69 , 19–26 (2007).

Download references

Acknowledgements

We thank Drs. Frédéric Allain, Fred Damberger, Hideo Iwai, Harindranath Kadavath, Julien Orts, and Dean Strotz for providing unpublished spectra. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 891690 (P.K.), and a Grant-in-Aid for Scientific Research of the Japan Society for the Promotion of Science (P.G., 20 K06508).

Author information

Authors and affiliations.

Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland

Piotr Klukowski, Roland Riek & Peter Güntert

Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany

  • Peter Güntert

Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397, Tokyo, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. prepared training and test data sets, designed and trained machine learning models, performed experiments described in the manuscript, and implemented ARTINA within the nmrtist.org web platform. P.K. and P.G. wrote the software. P.K., R.R., and P.G. conceived the project, analyzed the results, and wrote the manuscript.

Corresponding authors

Correspondence to Piotr Klukowski , Roland Riek or Peter Güntert .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Benjamin Bardiaux, Gaetano Montelione, Theresa Ramelot, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.  Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary info file #1, description of additional supplementary files, supplementary movie 1, supplementary movie 2, supplementary movie 3, supplementary movie 4, supplementary movie 5, reporting summary, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat Commun 13 , 6151 (2022). https://doi.org/10.1038/s41467-022-33879-5

Download citation

Received : 28 March 2022

Accepted : 30 September 2022

Published : 18 October 2022

DOI : https://doi.org/10.1038/s41467-022-33879-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

  • Piotr Klukowski
  • Fred F. Damberger

Scientific Data (2024)

Overlay databank unlocks data-driven analyses of biomolecules for all

  • Anne M. Kiirikki
  • Hanne S. Antila
  • O. H. Samuli Ollila

Nature Communications (2024)

5D solid-state NMR spectroscopy for facilitated resonance assignment

  • Alexander Klein
  • Suresh K. Vasa
  • Rasmus Linser

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

nmr assignment pdf

nmr assignment pdf

Physical Chemistry Chemical Physics

¹h and ¹³c chemical shift--structure effects in anhydrous β-caffeine and four caffeine--diacid cocrystals probed by solid-state nmr experiments and dft calculations.

By using density functional theory (DFT) calculations, we refined the H atom positions in the structures of β-caffeine (C), α-oxalic acid (OA; (COOH)₂), α-(COOH)₂⋅2H₂O, β-malonic acid (MA), β-glutaric acid (GA), and I-maleic acid (ME), along with their corresponding cocrystals of 2:1 (2C--OA, 2C--MA) or 1:1 (C--GA, C--ME) stoichiometry. The corresponding ¹³C/¹H chemical shifts obtained by gauge including projector augmented wave (GIPAW) calculations agreed overall very well with results from magic-angle-spinning (MAS) nuclear magnetic resonance (NMR) spectroscopy experiments. Chemical-shift/structure trends of the precursors and cocrystals were examined, where good linear correlations resulted for all COO¹H sites against the H⋯O and/or H⋯N H-bond distance, whereas a general correlation was neither found for the aliphatic/caffeine-stemming ¹H sites nor any ¹³C chemical shift against either the intermolecular hydrogen- or tetrel-bond distance, except for the ¹³COOH sites of the 2C--OA, 2C--MA, and C--GA cocrystals, which are involved in a strong COOH⋯N bond with caffeine that is responsible for the main supramolecular stabilization of the cocrystal. We provide the first complete ¹³C NMR spectral assignment of the structurally disordered anhydrous β-caffeine polymorph. The results are discussed in relation to previous literature on the disordered α-caffeine polymorph and the ordered hydrated counterpart, along with recommendations for NMR experimentation that will secure sufficient ¹³C signal-resolution for reliable resonance/site assignments.

Supplementary files

  • Supplementary information PDF (822K)

Article information

nmr assignment pdf

Download Citation

Permissions.

nmr assignment pdf

¹H and ¹³C Chemical Shift--Structure Effects in Anhydrous β-Caffeine and Four Caffeine--Diacid Cocrystals Probed by Solid-State NMR Experiments and DFT Calculations

D. Majhi, B. Stevensson, T. M. Nguyen and M. Edén, Phys. Chem. Chem. Phys. , 2024, Accepted Manuscript , DOI: 10.1039/D3CP06197C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence . You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

IMAGES

  1. (PDF) An NMR assignment module implemented in the Gifa NMR processing

    nmr assignment pdf

  2. (PDF) Strategy for complete NMR assignment of disordered proteins with

    nmr assignment pdf

  3. (PDF) NMR Assignment through Linear Programming

    nmr assignment pdf

  4. SOLUTION: Nmr Assignment

    nmr assignment pdf

  5. 13C NMR spectra and peak assignment of g1.

    nmr assignment pdf

  6. 1D 31P NMR spectrum shown along a 2D 1H–31P COSY spectrum. Assignment

    nmr assignment pdf

VIDEO

  1. Psssb Jail Warder

  2. Places in NEWS

  3. Final Paper 1: FR

  4. COMO CORTAR E COSTURAR VESTIDO ADULTO COM ELÁSTICO NAS COSTAS / PROJETO PAGO

  5. Class 8 English Poem 1 The Ant And The Cricket

  6. Live Etsy Shop Critiques with Business Coaches

COMMENTS

  1. PDF Chapter 13: Nuclear Magnetic Resonance (NMR) Spectroscopy

    Integration of 1H NMR resonances The area under an NMR resonance is proportional to the number of nuclei that give rise to that resonance. 2 6 integral d = 3.6 d = 1.2 The relative area under the resonances at d= 3.6 and 1.2 is 1:3 The integral is superimposed over the spectrum as a "stair-step" line.

  2. PDF Chapter 1 INTRODUCTION TO NMR SPECTROSCOPY

    4 Introduction to NMR Spectroscopy Table 1.2. Properties of NMR Active Nuclei. Nuclei1 γ(rad·sec−1 · gauss−1)† INaturalAbundance(%) 1H26,753 1/2 99.980 2H4,106 1 0.016 19F25,179 1/2 100.0002 13C6,728 1/2 1.1083 15N-2,712 1/2 0.373 31P10,841 1/2 100.00 1The term "Protons" is used interchangeably with 1Hinthetext. 2Fluorine is not normally found in biopolymers, therefore it has to ...

  3. PDF NMR assignment

    NMR assignment by Roy Hoffman 2006 9 Gradient pulses • It is possible to apply a magnetic gradient to the sample. • A gradient affects the signal in the following manner. At the start of the experiment it disperses the signal, making it disappear. Then the application of a gradient in the opposite direction allows the signal to be seen again.

  4. PDF Nuclear Magnetic Resonance: An Introduction

    Nuclear magnetic resonance or NMR is one of the most widely used discov-eries of Modern Physics. NMR is based on the bulk magnetic properties of materials made up of certain isotopes, most notably, protons (11H), but encompassing a wide variety of species including 13C, 19F, and 29Si. NMR is used to measure magnetic fields with exquisite precision.

  5. PDF Basic NMR Concepts

    From A.E. Derome, Modem NMR Techniques for Chemistry Research (1987) Basics of FT NMR- Six Critical Parameters This section will give you enough information about FT-NMR experiments to avoid the most common errors. We will cover the most important parameters that affect any spectrum you may collect using an FT-NMR spectrometer. These are: 1.

  6. PDF NMR (Nuclear magnetic resonance) spectroscopy

    NMR is a technique that is used to analyze the structure of many chemical substances, primarily organic and inorganic compounds. Place the sample in a static magnetic field. Excite nuclei in the sample with a radio frequency pulse. Measure the frequency of the signals emitted by the sample. Using NMR spectroscopy, information about the bonding ...

  7. PDF Understanding NMR Spectroscopy

    Chapter 3 introduces thevector modelof NMR. Thismodel has itslimita-tions, but it is very useful for understanding how pulses excite NMR signals. We can also use the vector model to understand the basic, but very impor-tant, NMR experiments such as pulse-acquire, inversion recovery and most importantly the spin echo.

  8. Assignment of 1H-NMR spectra

    H-NMR spectra. On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo [g]chrysene and cholesteryl acetate.

  9. PDF Biochemistry 530 NMR Theory and Practice

    Lecturer: Gabriele Varani Biochemistry and Chemistry Room J479 and Bagley 63 Phone: 543 7113 Email: [email protected]. Office Hours by arrangement. Lecture 1: Basic Principles of NMR Lecture 2: 2D NMR Lecture 3: NMR assignments/structure determination Lecture 4: 2D and 3D heteronuclear NMR.

  10. NMR: Structural Assignment

    This page describes what a proton NMR spectrum is and how it tells you useful things about the hydrogen atoms in organic molecules. Assignment of structures is a central problem which NMR is well suit to address. Explains how both 13C NMR spectra and low and high resolution proton NMR spectra can be used to help to work out the ….

  11. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    This page titled 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Serge L. Smirnov and James McCarty. In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra.

  12. PDF Assignment Nucleic Acids Generating Structural Restraints from Nuclear

    Nuclear magnetic resonance spectroscopy (NMR) enables the determination of the three-dimensional structure of proteins and nucleic acids in solution. A set of NMR experiments identifies the residues in the molecule and determines the order of residues that comprises the primary structure. Distance and angular relationships are subsequently ...

  13. PDF A novel strategy for NMR resonance assignment and protein ...

    the ''sequential assignment followed by NOE-derived structure'' paradigm. To this end, we describe a method for full 13C/15N/1H protein NMR assignments that integrate four key features to overcome many of the limitations often encountered using these traditional approaches. First, we make use of a min-

  14. Protein NMR Resonance Assignment

    Sequential Resonance Assignment Strategy. Here, the sequential assignment strategy based on the analysis of 1 H 2D-NMR spectra of proteins will be briefly outlined (Wüthrich 1986 ). This method is comprised of two stages. The first involves the identification of the spin systems of amino acid residues.

  15. PDF Russell S. Davis and Peter F. Flynn

    This further confirms our previous assignments. Exact. chemical shifts will be given in the 13C 1D NMR spectrum. gHMBC Analysis: The gHMBC spectrum was recorded on a 500 MHz (1H) spectrometer. with a spectral width of 6510.45 Hz and 1024 complex points in the. direct (1H/F2) dimension, and a spectral width of 23584.9 Hz and 512.

  16. NMR Methods for Stereochemical Assignments

    Abstract. Various NMR techniques have been developed that allow assignment of the relative and absolute configuration of many of the stereogenic carbons that occur in marine natural products. The application of chiral anisotropic reagents in conjunction with NMR analyses has been particularly useful for determining the absolute configuration of ...

  17. PDF A guide to small-molecule structure assignment through computation of

    For a typical structure assignment of a small organic molecule (e.g., fewer than ~10 non-H atoms or up to ~180 a.m.u. and ~20 conformers), this protocol can be completed in ~2 h of active effort

  18. PDF Automatic structure-based NMR methyl resonance assignment in large proteins

    The major bottleneck for NMR studies with selective methyl-labeled proteins is the resonance assignment, i.e. relating 1H/13C signals in the NMR spectra to speci c methyl groups in the fi protein ...

  19. Tutorials

    SH3 Assignment NMR Tutorial Learn how to AnalysisAssign to make manual assignments in solid-state NMR spectra, and assign a small protein using [1,3]- 13 C and [2] -13 C labelled glycerol samples. Download PDF

  20. (PDF) Complete NMR assignment of (+)-10?,14-dihydroxy ...

    PDF | The assignment of 1H and 13C NMR of the sesquiterpene (+)-10β,14-dihydroxy-allo-aromadendrane by means of two-dimensional NMR is reported.... | Find, read and cite all the research you need ...

  21. Rapid protein assignments and structures from raw NMR spectra with the

    However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist.

  22. (PDF) NMR Assignment through Linear Programming

    This paper introduces LIAN (LInear programming Assignment for NMR), a novel linear programming formulation of the problem which yields state-of-the-art results in simulated and experimental datasets.

  23. NMR assignment 2 LM new No Names.pdf

    CHEM 2510 LAB NMR ASSIGNMENT 2 2/27/24 Southeast Community College L. Malmgren 2. For each of the following chemical structures a H 1 NMR spectra is provided in the following pages. Match each compound with its H 1 NMR spectra. Draw the structure near the spectra and assign the chemical shifts using a,b,c … for all of the protons on the molecule. a.

  24. [PDF] Automated analysis of NMR assignments and structures for proteins

    A mean-field method that reports the resonance assignments in a probabilistic fashion, displaying the certainty of assignments in an unambiguous and quantitative manner was applied to the NMR data of the 172-residue peptide-binding domain of the E. coli heat-shock protein, DnaK. Expand

  25. ¹H and ¹³C Chemical Shift--Structure Effects in Anhydrous β-Caffeine

    We provide the first complete ¹³C NMR spectral assignment of the structurally disordered anhydrous β-caffeine polymorph. The results are discussed in relation to previous literature on the disordered α-caffeine polymorph and the ordered hydrated counterpart, along with recommendations for NMR experimentation that will secure sufficient ...