• Methodology
  • Open access
  • Published: 11 October 2016

Reviewing the research methods literature: principles and strategies illustrated by a systematic overview of sampling in qualitative research

  • Stephen J. Gentles 1 , 4 ,
  • Cathy Charles 1 ,
  • David B. Nicholas 2 ,
  • Jenny Ploeg 3 &
  • K. Ann McKibbon 1  

Systematic Reviews volume  5 , Article number:  172 ( 2016 ) Cite this article

52k Accesses

28 Citations

13 Altmetric

Metrics details

Overviews of methods are potentially useful means to increase clarity and enhance collective understanding of specific methods topics that may be characterized by ambiguity, inconsistency, or a lack of comprehensiveness. This type of review represents a distinct literature synthesis method, although to date, its methodology remains relatively undeveloped despite several aspects that demand unique review procedures. The purpose of this paper is to initiate discussion about what a rigorous systematic approach to reviews of methods, referred to here as systematic methods overviews , might look like by providing tentative suggestions for approaching specific challenges likely to be encountered. The guidance offered here was derived from experience conducting a systematic methods overview on the topic of sampling in qualitative research.

The guidance is organized into several principles that highlight specific objectives for this type of review given the common challenges that must be overcome to achieve them. Optional strategies for achieving each principle are also proposed, along with discussion of how they were successfully implemented in the overview on sampling. We describe seven paired principles and strategies that address the following aspects: delimiting the initial set of publications to consider, searching beyond standard bibliographic databases, searching without the availability of relevant metadata, selecting publications on purposeful conceptual grounds, defining concepts and other information to abstract iteratively, accounting for inconsistent terminology used to describe specific methods topics, and generating rigorous verifiable analytic interpretations. Since a broad aim in systematic methods overviews is to describe and interpret the relevant literature in qualitative terms, we suggest that iterative decision making at various stages of the review process, and a rigorous qualitative approach to analysis are necessary features of this review type.

Conclusions

We believe that the principles and strategies provided here will be useful to anyone choosing to undertake a systematic methods overview. This paper represents an initial effort to promote high quality critical evaluations of the literature regarding problematic methods topics, which have the potential to promote clearer, shared understandings, and accelerate advances in research methods. Further work is warranted to develop more definitive guidance.

Peer Review reports

While reviews of methods are not new, they represent a distinct review type whose methodology remains relatively under-addressed in the literature despite the clear implications for unique review procedures. One of few examples to describe it is a chapter containing reflections of two contributing authors in a book of 21 reviews on methodological topics compiled for the British National Health Service, Health Technology Assessment Program [ 1 ]. Notable is their observation of how the differences between the methods reviews and conventional quantitative systematic reviews, specifically attributable to their varying content and purpose, have implications for defining what qualifies as systematic. While the authors describe general aspects of “systematicity” (including rigorous application of a methodical search, abstraction, and analysis), they also describe a high degree of variation within the category of methods reviews itself and so offer little in the way of concrete guidance. In this paper, we present tentative concrete guidance, in the form of a preliminary set of proposed principles and optional strategies, for a rigorous systematic approach to reviewing and evaluating the literature on quantitative or qualitative methods topics. For purposes of this article, we have used the term systematic methods overview to emphasize the notion of a systematic approach to such reviews.

The conventional focus of rigorous literature reviews (i.e., review types for which systematic methods have been codified, including the various approaches to quantitative systematic reviews [ 2 – 4 ], and the numerous forms of qualitative and mixed methods literature synthesis [ 5 – 10 ]) is to synthesize empirical research findings from multiple studies. By contrast, the focus of overviews of methods, including the systematic approach we advocate, is to synthesize guidance on methods topics. The literature consulted for such reviews may include the methods literature, methods-relevant sections of empirical research reports, or both. Thus, this paper adds to previous work published in this journal—namely, recent preliminary guidance for conducting reviews of theory [ 11 ]—that has extended the application of systematic review methods to novel review types that are concerned with subject matter other than empirical research findings.

Published examples of methods overviews illustrate the varying objectives they can have. One objective is to establish methodological standards for appraisal purposes. For example, reviews of existing quality appraisal standards have been used to propose universal standards for appraising the quality of primary qualitative research [ 12 ] or evaluating qualitative research reports [ 13 ]. A second objective is to survey the methods-relevant sections of empirical research reports to establish current practices on methods use and reporting practices, which Moher and colleagues [ 14 ] recommend as a means for establishing the needs to be addressed in reporting guidelines (see, for example [ 15 , 16 ]). A third objective for a methods review is to offer clarity and enhance collective understanding regarding a specific methods topic that may be characterized by ambiguity, inconsistency, or a lack of comprehensiveness within the available methods literature. An example of this is a overview whose objective was to review the inconsistent definitions of intention-to-treat analysis (the methodologically preferred approach to analyze randomized controlled trial data) that have been offered in the methods literature and propose a solution for improving conceptual clarity [ 17 ]. Such reviews are warranted because students and researchers who must learn or apply research methods typically lack the time to systematically search, retrieve, review, and compare the available literature to develop a thorough and critical sense of the varied approaches regarding certain controversial or ambiguous methods topics.

While systematic methods overviews , as a review type, include both reviews of the methods literature and reviews of methods-relevant sections from empirical study reports, the guidance provided here is primarily applicable to reviews of the methods literature since it was derived from the experience of conducting such a review [ 18 ], described below. To our knowledge, there are no well-developed proposals on how to rigorously conduct such reviews. Such guidance would have the potential to improve the thoroughness and credibility of critical evaluations of the methods literature, which could increase their utility as a tool for generating understandings that advance research methods, both qualitative and quantitative. Our aim in this paper is thus to initiate discussion about what might constitute a rigorous approach to systematic methods overviews. While we hope to promote rigor in the conduct of systematic methods overviews wherever possible, we do not wish to suggest that all methods overviews need be conducted to the same standard. Rather, we believe that the level of rigor may need to be tailored pragmatically to the specific review objectives, which may not always justify the resource requirements of an intensive review process.

The example systematic methods overview on sampling in qualitative research

The principles and strategies we propose in this paper are derived from experience conducting a systematic methods overview on the topic of sampling in qualitative research [ 18 ]. The main objective of that methods overview was to bring clarity and deeper understanding of the prominent concepts related to sampling in qualitative research (purposeful sampling strategies, saturation, etc.). Specifically, we interpreted the available guidance, commenting on areas lacking clarity, consistency, or comprehensiveness (without proposing any recommendations on how to do sampling). This was achieved by a comparative and critical analysis of publications representing the most influential (i.e., highly cited) guidance across several methodological traditions in qualitative research.

The specific methods and procedures for the overview on sampling [ 18 ] from which our proposals are derived were developed both after soliciting initial input from local experts in qualitative research and an expert health librarian (KAM) and through ongoing careful deliberation throughout the review process. To summarize, in that review, we employed a transparent and rigorous approach to search the methods literature, selected publications for inclusion according to a purposeful and iterative process, abstracted textual data using structured abstraction forms, and analyzed (synthesized) the data using a systematic multi-step approach featuring abstraction of text, summary of information in matrices, and analytic comparisons.

For this article, we reflected on both the problems and challenges encountered at different stages of the review and our means for selecting justifiable procedures to deal with them. Several principles were then derived by considering the generic nature of these problems, while the generalizable aspects of the procedures used to address them formed the basis of optional strategies. Further details of the specific methods and procedures used in the overview on qualitative sampling are provided below to illustrate both the types of objectives and challenges that reviewers will likely need to consider and our approach to implementing each of the principles and strategies.

Organization of the guidance into principles and strategies

For the purposes of this article, principles are general statements outlining what we propose are important aims or considerations within a particular review process, given the unique objectives or challenges to be overcome with this type of review. These statements follow the general format, “considering the objective or challenge of X, we propose Y to be an important aim or consideration.” Strategies are optional and flexible approaches for implementing the previous principle outlined. Thus, generic challenges give rise to principles, which in turn give rise to strategies.

We organize the principles and strategies below into three sections corresponding to processes characteristic of most systematic literature synthesis approaches: literature identification and selection ; data abstraction from the publications selected for inclusion; and analysis , including critical appraisal and synthesis of the abstracted data. Within each section, we also describe the specific methodological decisions and procedures used in the overview on sampling in qualitative research [ 18 ] to illustrate how the principles and strategies for each review process were applied and implemented in a specific case. We expect this guidance and accompanying illustrations will be useful for anyone considering engaging in a methods overview, particularly those who may be familiar with conventional systematic review methods but may not yet appreciate some of the challenges specific to reviewing the methods literature.

Results and discussion

Literature identification and selection.

The identification and selection process includes search and retrieval of publications and the development and application of inclusion and exclusion criteria to select the publications that will be abstracted and analyzed in the final review. Literature identification and selection for overviews of the methods literature is challenging and potentially more resource-intensive than for most reviews of empirical research. This is true for several reasons that we describe below, alongside discussion of the potential solutions. Additionally, we suggest in this section how the selection procedures can be chosen to match the specific analytic approach used in methods overviews.

Delimiting a manageable set of publications

One aspect of methods overviews that can make identification and selection challenging is the fact that the universe of literature containing potentially relevant information regarding most methods-related topics is expansive and often unmanageably so. Reviewers are faced with two large categories of literature: the methods literature , where the possible publication types include journal articles, books, and book chapters; and the methods-relevant sections of empirical study reports , where the possible publication types include journal articles, monographs, books, theses, and conference proceedings. In our systematic overview of sampling in qualitative research, exhaustively searching (including retrieval and first-pass screening) all publication types across both categories of literature for information on a single methods-related topic was too burdensome to be feasible. The following proposed principle follows from the need to delimit a manageable set of literature for the review.

Principle #1:

Considering the broad universe of potentially relevant literature, we propose that an important objective early in the identification and selection stage is to delimit a manageable set of methods-relevant publications in accordance with the objectives of the methods overview.

Strategy #1:

To limit the set of methods-relevant publications that must be managed in the selection process, reviewers have the option to initially review only the methods literature, and exclude the methods-relevant sections of empirical study reports, provided this aligns with the review’s particular objectives.

We propose that reviewers are justified in choosing to select only the methods literature when the objective is to map out the range of recognized concepts relevant to a methods topic, to summarize the most authoritative or influential definitions or meanings for methods-related concepts, or to demonstrate a problematic lack of clarity regarding a widely established methods-related concept and potentially make recommendations for a preferred approach to the methods topic in question. For example, in the case of the methods overview on sampling [ 18 ], the primary aim was to define areas lacking in clarity for multiple widely established sampling-related topics. In the review on intention-to-treat in the context of missing outcome data [ 17 ], the authors identified a lack of clarity based on multiple inconsistent definitions in the literature and went on to recommend separating the issue of how to handle missing outcome data from the issue of whether an intention-to-treat analysis can be claimed.

In contrast to strategy #1, it may be appropriate to select the methods-relevant sections of empirical study reports when the objective is to illustrate how a methods concept is operationalized in research practice or reported by authors. For example, one could review all the publications in 2 years’ worth of issues of five high-impact field-related journals to answer questions about how researchers describe implementing a particular method or approach, or to quantify how consistently they define or report using it. Such reviews are often used to highlight gaps in the reporting practices regarding specific methods, which may be used to justify items to address in reporting guidelines (for example, [ 14 – 16 ]).

It is worth recognizing that other authors have advocated broader positions regarding the scope of literature to be considered in a review, expanding on our perspective. Suri [ 10 ] (who, like us, emphasizes how different sampling strategies are suitable for different literature synthesis objectives) has, for example, described a two-stage literature sampling procedure (pp. 96–97). First, reviewers use an initial approach to conduct a broad overview of the field—for reviews of methods topics, this would entail an initial review of the research methods literature. This is followed by a second more focused stage in which practical examples are purposefully selected—for methods reviews, this would involve sampling the empirical literature to illustrate key themes and variations. While this approach is seductive in its capacity to generate more in depth and interpretive analytic findings, some reviewers may consider it too resource-intensive to include the second step no matter how selective the purposeful sampling. In the overview on sampling where we stopped after the first stage [ 18 ], we discussed our selective focus on the methods literature as a limitation that left opportunities for further analysis of the literature. We explicitly recommended, for example, that theoretical sampling was a topic for which a future review of the methods sections of empirical reports was justified to answer specific questions identified in the primary review.

Ultimately, reviewers must make pragmatic decisions that balance resource considerations, combined with informed predictions about the depth and complexity of literature available on their topic, with the stated objectives of their review. The remaining principles and strategies apply primarily to overviews that include the methods literature, although some aspects may be relevant to reviews that include empirical study reports.

Searching beyond standard bibliographic databases

An important reality affecting identification and selection in overviews of the methods literature is the increased likelihood for relevant publications to be located in sources other than journal articles (which is usually not the case for overviews of empirical research, where journal articles generally represent the primary publication type). In the overview on sampling [ 18 ], out of 41 full-text publications retrieved and reviewed, only 4 were journal articles, while 37 were books or book chapters. Since many books and book chapters did not exist electronically, their full text had to be physically retrieved in hardcopy, while 11 publications were retrievable only through interlibrary loan or purchase request. The tasks associated with such retrieval are substantially more time-consuming than electronic retrieval. Since a substantial proportion of methods-related guidance may be located in publication types that are less comprehensively indexed in standard bibliographic databases, identification and retrieval thus become complicated processes.

Principle #2:

Considering that important sources of methods guidance can be located in non-journal publication types (e.g., books, book chapters) that tend to be poorly indexed in standard bibliographic databases, it is important to consider alternative search methods for identifying relevant publications to be further screened for inclusion.

Strategy #2:

To identify books, book chapters, and other non-journal publication types not thoroughly indexed in standard bibliographic databases, reviewers may choose to consult one or more of the following less standard sources: Google Scholar, publisher web sites, or expert opinion.

In the case of the overview on sampling in qualitative research [ 18 ], Google Scholar had two advantages over other standard bibliographic databases: it indexes and returns records of books and book chapters likely to contain guidance on qualitative research methods topics; and it has been validated as providing higher citation counts than ISI Web of Science (a producer of numerous bibliographic databases accessible through institutional subscription) for several non-biomedical disciplines including the social sciences where qualitative research methods are prominently used [ 19 – 21 ]. While we identified numerous useful publications by consulting experts, the author publication lists generated through Google Scholar searches were uniquely useful to identify more recent editions of methods books identified by experts.

Searching without relevant metadata

Determining what publications to select for inclusion in the overview on sampling [ 18 ] could only rarely be accomplished by reviewing the publication’s metadata. This was because for the many books and other non-journal type publications we identified as possibly relevant, the potential content of interest would be located in only a subsection of the publication. In this common scenario for reviews of the methods literature (as opposed to methods overviews that include empirical study reports), reviewers will often be unable to employ standard title, abstract, and keyword database searching or screening as a means for selecting publications.

Principle #3:

Considering that the presence of information about the topic of interest may not be indicated in the metadata for books and similar publication types, it is important to consider other means of identifying potentially useful publications for further screening.

Strategy #3:

One approach to identifying potentially useful books and similar publication types is to consider what classes of such publications (e.g., all methods manuals for a certain research approach) are likely to contain relevant content, then identify, retrieve, and review the full text of corresponding publications to determine whether they contain information on the topic of interest.

In the example of the overview on sampling in qualitative research [ 18 ], the topic of interest (sampling) was one of numerous topics covered in the general qualitative research methods manuals. Consequently, examples from this class of publications first had to be identified for retrieval according to non-keyword-dependent criteria. Thus, all methods manuals within the three research traditions reviewed (grounded theory, phenomenology, and case study) that might contain discussion of sampling were sought through Google Scholar and expert opinion, their full text obtained, and hand-searched for relevant content to determine eligibility. We used tables of contents and index sections of books to aid this hand searching.

Purposefully selecting literature on conceptual grounds

A final consideration in methods overviews relates to the type of analysis used to generate the review findings. Unlike quantitative systematic reviews where reviewers aim for accurate or unbiased quantitative estimates—something that requires identifying and selecting the literature exhaustively to obtain all relevant data available (i.e., a complete sample)—in methods overviews, reviewers must describe and interpret the relevant literature in qualitative terms to achieve review objectives. In other words, the aim in methods overviews is to seek coverage of the qualitative concepts relevant to the methods topic at hand. For example, in the overview of sampling in qualitative research [ 18 ], achieving review objectives entailed providing conceptual coverage of eight sampling-related topics that emerged as key domains. The following principle recognizes that literature sampling should therefore support generating qualitative conceptual data as the input to analysis.

Principle #4:

Since the analytic findings of a systematic methods overview are generated through qualitative description and interpretation of the literature on a specified topic, selection of the literature should be guided by a purposeful strategy designed to achieve adequate conceptual coverage (i.e., representing an appropriate degree of variation in relevant ideas) of the topic according to objectives of the review.

Strategy #4:

One strategy for choosing the purposeful approach to use in selecting the literature according to the review objectives is to consider whether those objectives imply exploring concepts either at a broad overview level, in which case combining maximum variation selection with a strategy that limits yield (e.g., critical case, politically important, or sampling for influence—described below) may be appropriate; or in depth, in which case purposeful approaches aimed at revealing innovative cases will likely be necessary.

In the methods overview on sampling, the implied scope was broad since we set out to review publications on sampling across three divergent qualitative research traditions—grounded theory, phenomenology, and case study—to facilitate making informative conceptual comparisons. Such an approach would be analogous to maximum variation sampling.

At the same time, the purpose of that review was to critically interrogate the clarity, consistency, and comprehensiveness of literature from these traditions that was “most likely to have widely influenced students’ and researchers’ ideas about sampling” (p. 1774) [ 18 ]. In other words, we explicitly set out to review and critique the most established and influential (and therefore dominant) literature, since this represents a common basis of knowledge among students and researchers seeking understanding or practical guidance on sampling in qualitative research. To achieve this objective, we purposefully sampled publications according to the criterion of influence , which we operationalized as how often an author or publication has been referenced in print or informal discourse. This second sampling approach also limited the literature we needed to consider within our broad scope review to a manageable amount.

To operationalize this strategy of sampling for influence , we sought to identify both the most influential authors within a qualitative research tradition (all of whose citations were subsequently screened) and the most influential publications on the topic of interest by non-influential authors. This involved a flexible approach that combined multiple indicators of influence to avoid the dilemma that any single indicator might provide inadequate coverage. These indicators included bibliometric data (h-index for author influence [ 22 ]; number of cites for publication influence), expert opinion, and cross-references in the literature (i.e., snowball sampling). As a final selection criterion, a publication was included only if it made an original contribution in terms of novel guidance regarding sampling or a related concept; thus, purely secondary sources were excluded. Publish or Perish software (Anne-Wil Harzing; available at http://www.harzing.com/resources/publish-or-perish ) was used to generate bibliometric data via the Google Scholar database. Figure  1 illustrates how identification and selection in the methods overview on sampling was a multi-faceted and iterative process. The authors selected as influential, and the publications selected for inclusion or exclusion are listed in Additional file 1 (Matrices 1, 2a, 2b).

Literature identification and selection process used in the methods overview on sampling [ 18 ]

In summary, the strategies of seeking maximum variation and sampling for influence were employed in the sampling overview to meet the specific review objectives described. Reviewers will need to consider the full range of purposeful literature sampling approaches at their disposal in deciding what best matches the specific aims of their own reviews. Suri [ 10 ] has recently retooled Patton’s well-known typology of purposeful sampling strategies (originally intended for primary research) for application to literature synthesis, providing a useful resource in this respect.

Data abstraction

The purpose of data abstraction in rigorous literature reviews is to locate and record all data relevant to the topic of interest from the full text of included publications, making them available for subsequent analysis. Conventionally, a data abstraction form—consisting of numerous distinct conceptually defined fields to which corresponding information from the source publication is recorded—is developed and employed. There are several challenges, however, to the processes of developing the abstraction form and abstracting the data itself when conducting methods overviews, which we address here. Some of these problems and their solutions may be familiar to those who have conducted qualitative literature syntheses, which are similarly conceptual.

Iteratively defining conceptual information to abstract

In the overview on sampling [ 18 ], while we surveyed multiple sources beforehand to develop a list of concepts relevant for abstraction (e.g., purposeful sampling strategies, saturation, sample size), there was no way for us to anticipate some concepts prior to encountering them in the review process. Indeed, in many cases, reviewers are unable to determine the complete set of methods-related concepts that will be the focus of the final review a priori without having systematically reviewed the publications to be included. Thus, defining what information to abstract beforehand may not be feasible.

Principle #5:

Considering the potential impracticality of defining a complete set of relevant methods-related concepts from a body of literature one has not yet systematically read, selecting and defining fields for data abstraction must often be undertaken iteratively. Thus, concepts to be abstracted can be expected to grow and change as data abstraction proceeds.

Strategy #5:

Reviewers can develop an initial form or set of concepts for abstraction purposes according to standard methods (e.g., incorporating expert feedback, pilot testing) and remain attentive to the need to iteratively revise it as concepts are added or modified during the review. Reviewers should document revisions and return to re-abstract data from previously abstracted publications as the new data requirements are determined.

In the sampling overview [ 18 ], we developed and maintained the abstraction form in Microsoft Word. We derived the initial set of abstraction fields from our own knowledge of relevant sampling-related concepts, consultation with local experts, and reviewing a pilot sample of publications. Since the publications in this review included a large proportion of books, the abstraction process often began by flagging the broad sections within a publication containing topic-relevant information for detailed review to identify text to abstract. When reviewing flagged text, the reviewer occasionally encountered an unanticipated concept significant enough to warrant being added as a new field to the abstraction form. For example, a field was added to capture how authors described the timing of sampling decisions, whether before (a priori) or after (ongoing) starting data collection, or whether this was unclear. In these cases, we systematically documented the modification to the form and returned to previously abstracted publications to abstract any information that might be relevant to the new field.

The logic of this strategy is analogous to the logic used in a form of research synthesis called best fit framework synthesis (BFFS) [ 23 – 25 ]. In that method, reviewers initially code evidence using an a priori framework they have selected. When evidence cannot be accommodated by the selected framework, reviewers then develop new themes or concepts from which they construct a new expanded framework. Both the strategy proposed and the BFFS approach to research synthesis are notable for their rigorous and transparent means to adapt a final set of concepts to the content under review.

Accounting for inconsistent terminology

An important complication affecting the abstraction process in methods overviews is that the language used by authors to describe methods-related concepts can easily vary across publications. For example, authors from different qualitative research traditions often use different terms for similar methods-related concepts. Furthermore, as we found in the sampling overview [ 18 ], there may be cases where no identifiable term, phrase, or label for a methods-related concept is used at all, and a description of it is given instead. This can make searching the text for relevant concepts based on keywords unreliable.

Principle #6:

Since accepted terms may not be used consistently to refer to methods concepts, it is necessary to rely on the definitions for concepts, rather than keywords, to identify relevant information in the publication to abstract.

Strategy #6:

An effective means to systematically identify relevant information is to develop and iteratively adjust written definitions for key concepts (corresponding to abstraction fields) that are consistent with and as inclusive of as much of the literature reviewed as possible. Reviewers then seek information that matches these definitions (rather than keywords) when scanning a publication for relevant data to abstract.

In the abstraction process for the sampling overview [ 18 ], we noted the several concepts of interest to the review for which abstraction by keyword was particularly problematic due to inconsistent terminology across publications: sampling , purposeful sampling , sampling strategy , and saturation (for examples, see Additional file 1 , Matrices 3a, 3b, 4). We iteratively developed definitions for these concepts by abstracting text from publications that either provided an explicit definition or from which an implicit definition could be derived, which was recorded in fields dedicated to the concept’s definition. Using a method of constant comparison, we used text from definition fields to inform and modify a centrally maintained definition of the corresponding concept to optimize its fit and inclusiveness with the literature reviewed. Table  1 shows, as an example, the final definition constructed in this way for one of the central concepts of the review, qualitative sampling .

We applied iteratively developed definitions when making decisions about what specific text to abstract for an existing field, which allowed us to abstract concept-relevant data even if no recognized keyword was used. For example, this was the case for the sampling-related concept, saturation , where the relevant text available for abstraction in one publication [ 26 ]—“to continue to collect data until nothing new was being observed or recorded, no matter how long that takes”—was not accompanied by any term or label whatsoever.

This comparative analytic strategy (and our approach to analysis more broadly as described in strategy #7, below) is analogous to the process of reciprocal translation —a technique first introduced for meta-ethnography by Noblit and Hare [ 27 ] that has since been recognized as a common element in a variety of qualitative metasynthesis approaches [ 28 ]. Reciprocal translation, taken broadly, involves making sense of a study’s findings in terms of the findings of the other studies included in the review. In practice, it has been operationalized in different ways. Melendez-Torres and colleagues developed a typology from their review of the metasynthesis literature, describing four overlapping categories of specific operations undertaken in reciprocal translation: visual representation, key paper integration, data reduction and thematic extraction, and line-by-line coding [ 28 ]. The approaches suggested in both strategies #6 and #7, with their emphasis on constant comparison, appear to fall within the line-by-line coding category.

Generating credible and verifiable analytic interpretations

The analysis in a systematic methods overview must support its more general objective, which we suggested above is often to offer clarity and enhance collective understanding regarding a chosen methods topic. In our experience, this involves describing and interpreting the relevant literature in qualitative terms. Furthermore, any interpretative analysis required may entail reaching different levels of abstraction, depending on the more specific objectives of the review. For example, in the overview on sampling [ 18 ], we aimed to produce a comparative analysis of how multiple sampling-related topics were treated differently within and among different qualitative research traditions. To promote credibility of the review, however, not only should one seek a qualitative analytic approach that facilitates reaching varying levels of abstraction but that approach must also ensure that abstract interpretations are supported and justified by the source data and not solely the product of the analyst’s speculative thinking.

Principle #7:

Considering the qualitative nature of the analysis required in systematic methods overviews, it is important to select an analytic method whose interpretations can be verified as being consistent with the literature selected, regardless of the level of abstraction reached.

Strategy #7:

We suggest employing the constant comparative method of analysis [ 29 ] because it supports developing and verifying analytic links to the source data throughout progressively interpretive or abstract levels. In applying this approach, we advise a rigorous approach, documenting how supportive quotes or references to the original texts are carried forward in the successive steps of analysis to allow for easy verification.

The analytic approach used in the methods overview on sampling [ 18 ] comprised four explicit steps, progressing in level of abstraction—data abstraction, matrices, narrative summaries, and final analytic conclusions (Fig.  2 ). While we have positioned data abstraction as the second stage of the generic review process (prior to Analysis), above, we also considered it as an initial step of analysis in the sampling overview for several reasons. First, it involved a process of constant comparisons and iterative decision-making about the fields to add or define during development and modification of the abstraction form, through which we established the range of concepts to be addressed in the review. At the same time, abstraction involved continuous analytic decisions about what textual quotes (ranging in size from short phrases to numerous paragraphs) to record in the fields thus created. This constant comparative process was analogous to open coding in which textual data from publications was compared to conceptual fields (equivalent to codes) or to other instances of data previously abstracted when constructing definitions to optimize their fit with the overall literature as described in strategy #6. Finally, in the data abstraction step, we also recorded our first interpretive thoughts in dedicated fields, providing initial material for the more abstract analytic steps.

Summary of progressive steps of analysis used in the methods overview on sampling [ 18 ]

In the second step of the analysis, we constructed topic-specific matrices , or tables, by copying relevant quotes from abstraction forms into the appropriate cells of matrices (for the complete set of analytic matrices developed in the sampling review, see Additional file 1 (matrices 3 to 10)). Each matrix ranged from one to five pages; row headings, nested three-deep, identified the methodological tradition, author, and publication, respectively; and column headings identified the concepts, which corresponded to abstraction fields. Matrices thus allowed us to make further comparisons across methodological traditions, and between authors within a tradition. In the third step of analysis, we recorded our comparative observations as narrative summaries , in which we used illustrative quotes more sparingly. In the final step, we developed analytic conclusions based on the narrative summaries about the sampling-related concepts within each methodological tradition for which clarity, consistency, or comprehensiveness of the available guidance appeared to be lacking. Higher levels of analysis thus built logically from the lower levels, enabling us to easily verify analytic conclusions by tracing the support for claims by comparing the original text of publications reviewed.

Integrative versus interpretive methods overviews

The analytic product of systematic methods overviews is comparable to qualitative evidence syntheses, since both involve describing and interpreting the relevant literature in qualitative terms. Most qualitative synthesis approaches strive to produce new conceptual understandings that vary in level of interpretation. Dixon-Woods and colleagues [ 30 ] elaborate on a useful distinction, originating from Noblit and Hare [ 27 ], between integrative and interpretive reviews. Integrative reviews focus on summarizing available primary data and involve using largely secure and well defined concepts to do so; definitions are used from an early stage to specify categories for abstraction (or coding) of data, which in turn supports their aggregation; they do not seek as their primary focus to develop or specify new concepts, although they may achieve some theoretical or interpretive functions. For interpretive reviews, meanwhile, the main focus is to develop new concepts and theories that integrate them, with the implication that the concepts developed become fully defined towards the end of the analysis. These two forms are not completely distinct, and “every integrative synthesis will include elements of interpretation, and every interpretive synthesis will include elements of aggregation of data” [ 30 ].

The example methods overview on sampling [ 18 ] could be classified as predominantly integrative because its primary goal was to aggregate influential authors’ ideas on sampling-related concepts; there were also, however, elements of interpretive synthesis since it aimed to develop new ideas about where clarity in guidance on certain sampling-related topics is lacking, and definitions for some concepts were flexible and not fixed until late in the review. We suggest that most systematic methods overviews will be classifiable as predominantly integrative (aggregative). Nevertheless, more highly interpretive methods overviews are also quite possible—for example, when the review objective is to provide a highly critical analysis for the purpose of generating new methodological guidance. In such cases, reviewers may need to sample more deeply (see strategy #4), specifically by selecting empirical research reports (i.e., to go beyond dominant or influential ideas in the methods literature) that are likely to feature innovations or instructive lessons in employing a given method.

In this paper, we have outlined tentative guidance in the form of seven principles and strategies on how to conduct systematic methods overviews, a review type in which methods-relevant literature is systematically analyzed with the aim of offering clarity and enhancing collective understanding regarding a specific methods topic. Our proposals include strategies for delimiting the set of publications to consider, searching beyond standard bibliographic databases, searching without the availability of relevant metadata, selecting publications on purposeful conceptual grounds, defining concepts and other information to abstract iteratively, accounting for inconsistent terminology, and generating credible and verifiable analytic interpretations. We hope the suggestions proposed will be useful to others undertaking reviews on methods topics in future.

As far as we are aware, this is the first published source of concrete guidance for conducting this type of review. It is important to note that our primary objective was to initiate methodological discussion by stimulating reflection on what rigorous methods for this type of review should look like, leaving the development of more complete guidance to future work. While derived from the experience of reviewing a single qualitative methods topic, we believe the principles and strategies provided are generalizable to overviews of both qualitative and quantitative methods topics alike. However, it is expected that additional challenges and insights for conducting such reviews have yet to be defined. Thus, we propose that next steps for developing more definitive guidance should involve an attempt to collect and integrate other reviewers’ perspectives and experiences in conducting systematic methods overviews on a broad range of qualitative and quantitative methods topics. Formalized guidance and standards would improve the quality of future methods overviews, something we believe has important implications for advancing qualitative and quantitative methodology. When undertaken to a high standard, rigorous critical evaluations of the available methods guidance have significant potential to make implicit controversies explicit, and improve the clarity and precision of our understandings of problematic qualitative or quantitative methods issues.

A review process central to most types of rigorous reviews of empirical studies, which we did not explicitly address in a separate review step above, is quality appraisal . The reason we have not treated this as a separate step stems from the different objectives of the primary publications included in overviews of the methods literature (i.e., providing methodological guidance) compared to the primary publications included in the other established review types (i.e., reporting findings from single empirical studies). This is not to say that appraising quality of the methods literature is not an important concern for systematic methods overviews. Rather, appraisal is much more integral to (and difficult to separate from) the analysis step, in which we advocate appraising clarity, consistency, and comprehensiveness—the quality appraisal criteria that we suggest are appropriate for the methods literature. As a second important difference regarding appraisal, we currently advocate appraising the aforementioned aspects at the level of the literature in aggregate rather than at the level of individual publications. One reason for this is that methods guidance from individual publications generally builds on previous literature, and thus we feel that ahistorical judgments about comprehensiveness of single publications lack relevance and utility. Additionally, while different methods authors may express themselves less clearly than others, their guidance can nonetheless be highly influential and useful, and should therefore not be downgraded or ignored based on considerations of clarity—which raises questions about the alternative uses that quality appraisals of individual publications might have. Finally, legitimate variability in the perspectives that methods authors wish to emphasize, and the levels of generality at which they write about methods, makes critiquing individual publications based on the criterion of clarity a complex and potentially problematic endeavor that is beyond the scope of this paper to address. By appraising the current state of the literature at a holistic level, reviewers stand to identify important gaps in understanding that represent valuable opportunities for further methodological development.

To summarize, the principles and strategies provided here may be useful to those seeking to undertake their own systematic methods overview. Additional work is needed, however, to establish guidance that is comprehensive by comparing the experiences from conducting a variety of methods overviews on a range of methods topics. Efforts that further advance standards for systematic methods overviews have the potential to promote high-quality critical evaluations that produce conceptually clear and unified understandings of problematic methods topics, thereby accelerating the advance of research methodology.

Hutton JL, Ashcroft R. What does “systematic” mean for reviews of methods? In: Black N, Brazier J, Fitzpatrick R, Reeves B, editors. Health services research methods: a guide to best practice. London: BMJ Publishing Group; 1998. p. 249–54.

Google Scholar  

Cochrane handbook for systematic reviews of interventions. In. Edited by Higgins JPT, Green S, Version 5.1.0 edn: The Cochrane Collaboration; 2011.

Centre for Reviews and Dissemination: Systematic reviews: CRD’s guidance for undertaking reviews in health care . York: Centre for Reviews and Dissemination; 2009.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700–0.

Barnett-Page E, Thomas J. Methods for the synthesis of qualitative research: a critical review. BMC Med Res Methodol. 2009;9(1):59.

Article   PubMed   PubMed Central   Google Scholar  

Kastner M, Tricco AC, Soobiah C, Lillie E, Perrier L, Horsley T, Welch V, Cogo E, Antony J, Straus SE. What is the most appropriate knowledge synthesis method to conduct a review? Protocol for a scoping review. BMC Med Res Methodol. 2012;12(1):1–1.

Article   Google Scholar  

Booth A, Noyes J, Flemming K, Gerhardus A. Guidance on choosing qualitative evidence synthesis methods for use in health technology assessments of complex interventions. In: Integrate-HTA. 2016.

Booth A, Sutton A, Papaioannou D. Systematic approaches to successful literature review. 2nd ed. London: Sage; 2016.

Hannes K, Lockwood C. Synthesizing qualitative research: choosing the right approach. Chichester: Wiley-Blackwell; 2012.

Suri H. Towards methodologically inclusive research syntheses: expanding possibilities. New York: Routledge; 2014.

Campbell M, Egan M, Lorenc T, Bond L, Popham F, Fenton C, Benzeval M. Considering methodological options for reviews of theory: illustrated by a review of theories linking income and health. Syst Rev. 2014;3(1):1–11.

Cohen DJ, Crabtree BF. Evaluative criteria for qualitative research in health care: controversies and recommendations. Ann Fam Med. 2008;6(4):331–9.

Tong A, Sainsbury P, Craig J. Consolidated criteria for reportingqualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–57.

Article   PubMed   Google Scholar  

Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7(2):e1000217.

Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG. Epidemiology and reporting characteristics of systematic reviews. PLoS Med. 2007;4(3):e78.

Chan AW, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet. 2005;365(9465):1159–62.

Alshurafa M, Briel M, Akl EA, Haines T, Moayyedi P, Gentles SJ, Rios L, Tran C, Bhatnagar N, Lamontagne F, et al. Inconsistent definitions for intention-to-treat in relation to missing outcome data: systematic review of the methods literature. PLoS One. 2012;7(11):e49163.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Gentles SJ, Charles C, Ploeg J, McKibbon KA. Sampling in qualitative research: insights from an overview of the methods literature. Qual Rep. 2015;20(11):1772–89.

Harzing A-W, Alakangas S. Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison. Scientometrics. 2016;106(2):787–804.

Harzing A-WK, van der Wal R. Google Scholar as a new source for citation analysis. Ethics Sci Environ Polit. 2008;8(1):61–73.

Kousha K, Thelwall M. Google Scholar citations and Google Web/URL citations: a multi‐discipline exploratory analysis. J Assoc Inf Sci Technol. 2007;58(7):1055–65.

Hirsch JE. An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A. 2005;102(46):16569–72.

Booth A, Carroll C. How to build up the actionable knowledge base: the role of ‘best fit’ framework synthesis for studies of improvement in healthcare. BMJ Quality Safety. 2015;24(11):700–8.

Carroll C, Booth A, Leaviss J, Rick J. “Best fit” framework synthesis: refining the method. BMC Med Res Methodol. 2013;13(1):37.

Carroll C, Booth A, Cooper K. A worked example of “best fit” framework synthesis: a systematic review of views concerning the taking of some potential chemopreventive agents. BMC Med Res Methodol. 2011;11(1):29.

Cohen MZ, Kahn DL, Steeves DL. Hermeneutic phenomenological research: a practical guide for nurse researchers. Thousand Oaks: Sage; 2000.

Noblit GW, Hare RD. Meta-ethnography: synthesizing qualitative studies. Newbury Park: Sage; 1988.

Book   Google Scholar  

Melendez-Torres GJ, Grant S, Bonell C. A systematic review and critical appraisal of qualitative metasynthetic practice in public health to develop a taxonomy of operations of reciprocal translation. Res Synthesis Methods. 2015;6(4):357–71.

Article   CAS   Google Scholar  

Glaser BG, Strauss A. The discovery of grounded theory. Chicago: Aldine; 1967.

Dixon-Woods M, Agarwal S, Young B, Jones D, Sutton A. Integrative approaches to qualitative and quantitative evidence. In: UK National Health Service. 2004. p. 1–44.

Download references

Acknowledgements

Not applicable.

There was no funding for this work.

Availability of data and materials

The systematic methods overview used as a worked example in this article (Gentles SJ, Charles C, Ploeg J, McKibbon KA: Sampling in qualitative research: insights from an overview of the methods literature. The Qual Rep 2015, 20(11):1772-1789) is available from http://nsuworks.nova.edu/tqr/vol20/iss11/5 .

Authors’ contributions

SJG wrote the first draft of this article, with CC contributing to drafting. All authors contributed to revising the manuscript. All authors except CC (deceased) approved the final draft. SJG, CC, KAB, and JP were involved in developing methods for the systematic methods overview on sampling.

Authors’ information

Competing interests.

The authors declare that they have no competing interests.

Consent for publication

Ethics approval and consent to participate, author information, authors and affiliations.

Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada

Stephen J. Gentles, Cathy Charles & K. Ann McKibbon

Faculty of Social Work, University of Calgary, Alberta, Canada

David B. Nicholas

School of Nursing, McMaster University, Hamilton, Ontario, Canada

Jenny Ploeg

CanChild Centre for Childhood Disability Research, McMaster University, 1400 Main Street West, IAHS 408, Hamilton, ON, L8S 1C7, Canada

Stephen J. Gentles

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Stephen J. Gentles .

Additional information

Cathy Charles is deceased

Additional file

Additional file 1:.

Submitted: Analysis_matrices. (DOC 330 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Gentles, S.J., Charles, C., Nicholas, D.B. et al. Reviewing the research methods literature: principles and strategies illustrated by a systematic overview of sampling in qualitative research. Syst Rev 5 , 172 (2016). https://doi.org/10.1186/s13643-016-0343-0

Download citation

Received : 06 June 2016

Accepted : 14 September 2016

Published : 11 October 2016

DOI : https://doi.org/10.1186/s13643-016-0343-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review
  • Literature selection
  • Research methods
  • Research methodology
  • Overview of methods
  • Systematic methods overview
  • Review methods

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

peer reviewed journal articles on research methods

peer reviewed journal articles on research methods

Behavior Research Methods

  • An official publication of The Psychonomic Society
  • Focuses on the application of computer technology in psychological research.
  • Aims to improve cognitive-psychology research by making it more effective, less error-prone, and easier to run
  • Publishes work on new and improved tools, tutorials, articles and reviews that make existing practices more agile

Societies and partnerships

New Content Item

Latest issue

Volume 56, Issue 3

Latest articles

Optimal processing of surface facial emg to identify emotional expressions: a data-driven approach.

  • J. M. Rutkowska
  • T. Ghilardi
  • R. Oostenveld

peer reviewed journal articles on research methods

How ready is speech-to-text for psychological language research? Evaluating the validity of AI-generated English transcripts for analyzing free-spoken responses in younger and older adults

  • Valeria A. Pfeifer
  • Trish D. Chilton
  • Matthias R. Mehl

Connecting process models to response times through Bayesian hierarchical regression analysis

  • Thea Behrens
  • Adrian Kühn
  • Frank Jäkel

peer reviewed journal articles on research methods

Measuring temporal bias in sequential numerosity comparison

  • Serena Dolfi
  • Alberto Testolin
  • Marco Zorzi

peer reviewed journal articles on research methods

Decoding the essence of two-character Chinese words: Unveiling valence, arousal, concreteness, familiarity, and imageability through word norming

  • Yuen-Lai Chan
  • Chi-Shing Tse

peer reviewed journal articles on research methods

Journal updates

Freelance submission systems administrator/submissions managing editor.

Springer and The Psychonomic Society are seeking a freelance submissions administrator/ managing editor (remote) for the Psychonomic Society portfolio of journals.

2023 Reviewer Acknowledgements

We acknowledge with gratitude the following Reviewers who contributed to the peer review process of Behavior Research Methods in 2023. We value your generous contributions.

2023 Psychonomic Society Best Article Award

Call for papers: “methodological challenges of complex latent mediator and moderator models”.

A Special Issue of Behavior Research Methods (BRM)

Click here for full details.

Journal information

  • Biological Abstracts
  • Current Contents/Social & Behavioral Sciences
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Social Science Citation Index
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© The Psychonomic Society, Inc.

  • Find a journal
  • Publish with us
  • Track your research

Navigation group

Home banner.

Ice climbing under aurora

Where scientists empower society

Creating solutions for healthy lives on a healthy planet.

most-cited publisher

largest publisher

2.5 billion

article views and downloads

Main Content

  • Editors and reviewers
  • Collaborators

Male doctor examining petri dish at laboratory while coworker working in background

Find a journal

We have a home for your research. Our community led journals cover more than 1,500 academic disciplines and are some of the largest and most cited in their fields.

Confident young woman gesturing while teaching students in class

Submit your research

Start your submission and get more impact for your research by publishing with us.

Active senior woman concentrating while working on laptop

Author guidelines

Ready to publish? Check our author guidelines for everything you need to know about submitting, from choosing a journal and section to preparing your manuscript.

Smiling colleagues doing research over laptop computer on desk in office

Peer review

Our efficient collaborative peer review means you’ll get a decision on your manuscript in an average of 61 days.

Interior of a library with desks and bookshelves

Article publishing charges (APCs) apply to articles that are accepted for publication by our external and independent editorial boards

Group of international university students having fun studying in library, three colleagues of modern work co-working space talking and smiling while sitting at the desk table with laptop computer

Press office

Visit our press office for key media contact information, as well as Frontiers’ media kit, including our embargo policy, logos, key facts, leadership bios, and imagery.

Back view of man presenting to students at a lecture theatre

Institutional partnerships

Join more than 555 institutions around the world already benefiting from an institutional membership with Frontiers, including CERN, Max Planck Society, and the University of Oxford.

Happy senior old korean businesswoman discussing online project on laptop with african american male colleague, working together in pairs at shared workplace, analyzing electronic documents.

Publishing partnerships

Partner with Frontiers and make your society’s transition to open access a reality with our custom-built platform and publishing expertise.

Welsh Assembly debating chamber, UK.

Policy Labs

Connecting experts from business, science, and policy to strengthen the dialogue between scientific research and informed policymaking.

Smiling African American Woman Talking to Boss in Office

How we publish

All Frontiers journals are community-run and fully open access, so every research article we publish is immediately and permanently free to read.

Front view portrait of African American man wearing lab coat and raising hand asking question while sitting in audience and listening to lecture on medicine

Editor guidelines

Reviewing a manuscript? See our guidelines for everything you need to know about our peer review process.

Shaking hands. African American dark-skinned man touching hands of his light-skinned workmate in greeting gesture

Become an editor

Apply to join an editorial board and collaborate with an international team of carefully selected independent researchers.

Scientist looking at 3D rendered graphic scans from Magnetic Resonance Imaging (MRI) scanner, close up

My assignments

It’s easy to find and track your editorial assignments with our platform, 'My Frontiers' – saving you time to spend on your own research.

FSCI_Hub_Genomic-Surveillance_Struelens_Hub-card

Experts call for global genetic warning system to combat the next pandemic and antimicrobial resistance

Scientists champion global genomic surveillance using latest technologies and a “One Health” approach to protect against novel pathogens like avian influenza and antimicrobial resistance, catching epidemics before they start.

winter kayaking in Antarctica, extreme sport adventure, people paddling on kayak near iceberg

Safeguarding peer review to ensure quality at scale

Making scientific research open has never been more important. But for research to be trusted, it must be of the highest quality. Facing an industry-wide rise in fraudulent science, Frontiers has increased its focus on safeguarding quality.

Photo of a forested area overlooking a smoggy cityscape

Scientists call for urgent action to prevent immune-mediated illnesses caused by climate change and biodiversity loss

Climate change, pollution, and collapsing biodiversity are damaging our immune systems, but improving the environment offers effective and fast-acting protection.

Blacktip Reef Shark hunting in a shoal of fish. Sea life ecosystem. Wild baby black tip reef shark from above in tropical clear waters school of fish. Turquoise marine aqua background wallpaper. Asia.

Baby sharks prefer being closer to shore, show scientists

marine scientists have shown for the first time that juvenile great white sharks select warm and shallow waters to aggregate within one kilometer from the shore.

Nurse explaining senior woman how to take medicine.

Puzzling link between depression and cardiovascular disease explained at last

It’s long been known that depression and cardiovascular disease are somehow related, though exactly how remained a puzzle. Now, researchers have identified a ‘gene module’ which is part of the developmental program of both diseases.

Woman with baby sitting on a bench in a polluted city. environmental pollution and bad ecology in city concept

Air pollution could increase the risk of neurological disorders: Here are five Frontiers articles you won’t want to miss this Earth Day

At Frontiers, we bring some of the world’s best research to a global audience. But with tens of thousands of articles published each year, it’s impossible to cover all of them. Here are just five amazing papers you may have missed.

Cheerful Chinese or Japanese woman Doctor playing with her little patient, Asian Female pediatrician examining baby boy with stethoscope in medical examination room. BeH3althy

Opening health for all: 7 Research Topics shaping a healthier world

We have picked 7 Research Topics that tackle some of the world's toughest healthcare challenges. These topics champion everyone's access to healthcare, life-limiting illness as a public health challenge, and the ethical challenges in digital public health.

Get the latest research updates, subscribe to our newsletter

Disclaimer » Advertising

  • HealthyChildren.org

Issue Cover

  • Previous Article
  • Next Article

What is the Purpose of Peer Review?

What makes a good peer reviewer, how do you decide whether to review a paper, how do you complete a peer review, limitations of peer review, conclusions, research methods: how to perform an effective peer review.

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • CME Quiz Close Quiz
  • Open the PDF for in another window
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Elise Peterson Lu , Brett G. Fischer , Melissa A. Plesac , Andrew P.J. Olson; Research Methods: How to Perform an Effective Peer Review. Hosp Pediatr November 2022; 12 (11): e409–e413. https://doi.org/10.1542/hpeds.2022-006764

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Scientific peer review has existed for centuries and is a cornerstone of the scientific publication process. Because the number of scientific publications has rapidly increased over the past decades, so has the number of peer reviews and peer reviewers. In this paper, drawing on the relevant medical literature and our collective experience as peer reviewers, we provide a user guide to the peer review process, including discussion of the purpose and limitations of peer review, the qualities of a good peer reviewer, and a step-by-step process of how to conduct an effective peer review.

Peer review has been a part of scientific publications since 1665, when the Philosophical Transactions of the Royal Society became the first publication to formalize a system of expert review. 1 , 2   It became an institutionalized part of science in the latter half of the 20 th century and is now the standard in scientific research publications. 3   In 2012, there were more than 28 000 scholarly peer-reviewed journals and more than 3 million peer reviewed articles are now published annually. 3 , 4   However, even with this volume, most peer reviewers learn to review “on the (unpaid) job” and no standard training system exists to ensure quality and consistency. 5   Expectations and format vary between journals and most, but not all, provide basic instructions for reviewers. In this paper, we provide a general introduction to the peer review process and identify common strategies for success as well as pitfalls to avoid.

Modern peer review serves 2 primary purposes: (1) as “a screen before the diffusion of new knowledge” 6   and (2) as a method to improve the quality of published work. 1 , 5  

As screeners, peer reviewers evaluate the quality, validity, relevance, and significance of research before publication to maintain the credibility of the publications they serve and their fields of study. 1 , 2 , 7   Although peer reviewers are not the final decision makers on publication (that role belongs to the editor), their recommendations affect editorial decisions and thoughtful comments influence an article’s fate. 6 , 8  

As advisors and evaluators of manuscripts, reviewers have an opportunity and responsibility to give authors an outside expert’s perspective on their work. 9   They provide feedback that can improve methodology, enhance rigor, improve clarity, and redefine the scope of articles. 5 , 8 , 10   This often happens even if a paper is not ultimately accepted at the reviewer’s journal because peer reviewers’ comments are incorporated into revised drafts that are submitted to another journal. In a 2019 survey of authors, reviewers, and editors, 83% said that peer review helps science communication and 90% of authors reported that peer review improved their last paper. 11  

Expertise: Peer reviewers should be up to date with current literature, practice guidelines, and methodology within their subject area. However, academic rank and seniority do not define expertise and are not actually correlated with performance in peer review. 13  

Professionalism: Reviewers should be reliable and objective, aware of their own biases, and respectful of the confidentiality of the peer review process.

Critical skill : Reviewers should be organized, thorough, and detailed in their critique with the goal of improving the manuscript under their review, regardless of disposition. They should provide constructive comments that are specific and addressable, referencing literature when possible. A peer reviewer should leave a paper better than he or she found it.

Is the manuscript within your area of expertise? Generally, if you are asked to review a paper, it is because an editor felt that you were a qualified expert. In a 2019 survey, 74% of requested reviews were within the reviewer’s area of expertise. 11   This, of course, does not mean that you must be widely published in the area, only that you have enough expertise and comfort with the topic to critique and add to the paper.

Do you have any biases that may affect your review? Are there elements of the methodology, content area, or theory with which you disagree? Some disagreements between authors and reviewers are common, expected, and even helpful. However, if a reviewer fundamentally disagrees with an author’s premise such that he or she cannot be constructive, the review invitation should be declined.

Do you have the time? The average review for a clinical journal takes 5 to 6 hours, though many take longer depending on the complexity of the research and the experience of the reviewer. 1 , 14   Journals vary on the requested timeline for return of reviews, though it is usually 1 to 4 weeks. Peer review is often the longest part of the publication process and delays contribute to slower dissemination of important work and decreased author satisfaction. 15   Be mindful of your schedule and only accept a review invitation if you can reasonably return the review in the requested time.

Once you have determined that you are the right person and decided to take on the review, reply to the inviting e-mail or click the associated link to accept (or decline) the invitation. Journal editors invite a limited number of reviewers at a time and wait for responses before inviting others. A common complaint among journal editors surveyed was that reviewers would often take days to weeks to respond to requests, or not respond at all, making it difficult to find appropriate reviewers and prolonging an already long process. 5  

Now that you have decided to take on the review, it is best of have a systematic way of both evaluating the manuscript and writing the review. Various suggestions exist in the literature, but we will describe our standard procedure for review, incorporating specific do’s and don’ts summarized in Table 1 .

Dos and Don’ts of Peer Review

First, read the manuscript once without making notes or forming opinions to get a sense of the paper as whole. Assess the overall tone and flow and define what the authors identify as the main point of their work. Does the work overall make sense? Do the authors tell the story effectively?

Next, read the manuscript again with an eye toward review, taking notes and formulating thoughts on strengths and weaknesses. Consider the methodology and identify the specific type of research described. Refer to the corresponding reporting guideline if applicable (CONSORT for randomized control trials, STROBE for observational studies, PRISMA for systematic reviews). Reporting guidelines often include a checklist, flow diagram, or structured text giving a minimum list of information needed in a manuscript based on the type of research done. 16   This allows the reviewer to formulate a more nuanced and specific assessment of the manuscript.

Next, review the main findings, the significance of the work, and what contribution it makes to the field. Examine the presentation and flow of the manuscript but do not copy edit the text. At this point, you should start to write your review. Some journals provide a format for their reviews, but often it is up to the reviewer. In surveys of journal editors and reviewers, a review organized by manuscript section was the most favored, 5 , 6   so that is what we will describe here.

As you write your review, consider starting with a brief summary of the work that identifies the main topic, explains the basic approach, and describes the findings and conclusions. 12 , 17   Though not universally included in all reviews, we have found this step to be helpful in ensuring that the work is conveyed clearly enough for the reviewer to summarize it. Include brief notes on the significance of the work and what it adds to current knowledge. Critique the presentation of the work: is it clearly written? Is its length appropriate? List any major concerns with the work overall, such as major methodological flaws or inaccurate conclusions that should disqualify it from publication, though do not comment directly on disposition. Then perform your review by section:

Abstract : Is it consistent with the rest of the paper? Does it adequately describe the major points?

Introduction : This section should provide adequate background to explain the need for the study. Generally, classic or highly relevant studies should be cited, but citations do not have to be exhaustive. The research question and hypothesis should be clearly stated.

Methods: Evaluate both the methods themselves and the way in which they are explained. Does the methodology used meet the needs of the questions proposed? Is there sufficient detail to explain what the authors did and, if not, what needs to be added? For clinical research, examine the inclusion/exclusion criteria, control populations, and possible sources of bias. Reporting guidelines can be particularly helpful in determining the appropriateness of the methods and how they are reported.

Some journals will expect an evaluation of the statistics used, whereas others will have a separate statistician evaluate, and the reviewers are generally not expected to have an exhaustive knowledge of statistical methods. Clarify expectations if needed and, if you do not feel qualified to evaluate the statistics, make this clear in your review.

Results: Evaluate the presentation of the results. Is information given in sufficient detail to assess credibility? Are the results consistent with the methodology reported? Are the figures and tables consistent with the text, easy to interpret, and relevant to the work? Make note of data that could be better detailed in figures or tables, rather than included in the text. Make note of inappropriate interpretation in the results section (this should be in discussion) or rehashing of methods.

Discussion: Evaluate the authors’ interpretation of their results, how they address limitations, and the implications of their work. How does the work contribute to the field, and do the authors adequately describe those contributions? Make note of overinterpretation or conclusions not supported by the data.

The length of your review often correlates with your opinion of the quality of the work. If an article has major flaws that you think preclude publication, write a brief review that focuses on the big picture. Articles that may not be accepted but still represent quality work merit longer reviews aimed at helping the author improve the work for resubmission elsewhere.

Generally, do not include your recommendation on disposition in the body of the review itself. Acceptance or rejection is ultimately determined by the editor and including your recommendation in your comments to the authors can be confusing. A journal editor’s decision on acceptance or rejection may depend on more factors than just the quality of the work, including the subject area, journal priorities, other contemporaneous submissions, and page constraints.

Many submission sites include a separate question asking whether to accept, accept with major revision, or reject. If this specific format is not included, then add your recommendation in the “confidential notes to the editor.” Your recommendation should be consistent with the content of your review: don’t give a glowing review but recommend rejection or harshly criticize a manuscript but recommend publication. Last, regardless of your ultimate recommendation on disposition, it is imperative to use respectful and professional language and tone in your written review.

Although peer review is often described as the “gatekeeper” of science and characterized as a quality control measure, peer review is not ideally designed to detect fundamental errors, plagiarism, or fraud. In multiple studies, peer reviewers detected only 20% to 33% of intentionally inserted errors in scientific manuscripts. 18 , 19   Plagiarism similarly is not detected in peer review, largely because of the huge volume of literature available to plagiarize. Most journals now use computer software to identify plagiarism before a manuscript goes to peer review. Finally, outright fraud often goes undetected in peer review. Reviewers start from a position of respect for the authors and trust the data they are given barring obvious inconsistencies. Ultimately, reviewers are “gatekeepers, not detectives.” 7  

Peer review is also limited by bias. Even with the best of intentions, reviewers bring biases including but not limited to prestige bias, affiliation bias, nationality bias, language bias, gender bias, content bias, confirmation bias, bias against interdisciplinary research, publication bias, conservatism, and bias of conflict of interest. 3 , 4 , 6   For example, peer reviewers score methodology higher and are more likely to recommend publication when prestigious author names or institutions are visible. 20   Although bias can be mitigated both by the reviewer and by the journal, it cannot be eliminated. Reviewers should be mindful of their own biases while performing reviews and work to actively mitigate them. For example, if English language editing is necessary, state this with specific examples rather than suggesting the authors seek editing by a “native English speaker.”

Peer review is an essential, though imperfect, part of the forward movement of science. Peer review can function as both a gatekeeper to protect the published record of science and a mechanism to improve research at the level of individual manuscripts. Here, we have described our strategy, summarized in Table 2 , for performing a thorough peer review, with a focus on organization, objectivity, and constructiveness. By using a systematized strategy to evaluate manuscripts and an organized format for writing reviews, you can provide a relatively objective perspective in editorial decision-making. By providing specific and constructive feedback to authors, you contribute to the quality of the published literature.

Take-home Points

FUNDING: No external funding.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.

Dr Lu performed the literature review and wrote the manuscript. Dr Fischer assisted in the literature review and reviewed and edited the manuscript. Dr Plesac provided background information on the process of peer review, reviewed and edited the manuscript, and completed revisions. Dr Olson provided background information and practical advice, critically reviewed and revised the manuscript, and approved the final manuscript.

Advertising Disclaimer »

Citing articles via

Email alerts.

peer reviewed journal articles on research methods

Affiliations

  • Editorial Board
  • Editorial Policies
  • Pediatrics On Call
  • Online ISSN 2154-1671
  • Print ISSN 2154-1663
  • Pediatrics Open Science
  • Hospital Pediatrics
  • Pediatrics in Review
  • AAP Grand Rounds
  • Latest News
  • Pediatric Care Online
  • Red Book Online
  • Pediatric Patient Education
  • AAP Toolkits
  • AAP Pediatric Coding Newsletter

First 1,000 Days Knowledge Center

Institutions/librarians, group practices, licensing/permissions, integrations, advertising.

  • Privacy Statement | Accessibility Statement | Terms of Use | Support Center | Contact Us
  • © Copyright American Academy of Pediatrics

This Feature Is Available To Subscribers Only

Sign In or Create an Account

Banner

Research Methods: Peer-Reviewed Journal Articles

  • Getting Started
  • What Type of Source?
  • Credible Sources
  • Finding Background Information
  • Library Databases
  • Reference Books
  • Electronic Books
  • Online Reference Collections
  • Databases Presenting Two Sides of an Argument
  • Peer-Reviewed Journal Articles
  • Writing Style Guides and Citing Sources
  • Annotated Bibliographies

What is a Peer-Reviewed (Academic) Journal?

What Is a Peer-Reviewed Journal?

Peer Review is a process that journals use to ensure the articles they publish represent the best scholarship currently available. When an article is submitted to a peer reviewed journal, the editors send it out to other scholars in the same field (the author's peers) to get their opinion on the quality of the scholarship, its relevance to the field, its appropriateness for the journal, etc.

Publications that don't use peer review (Time, Cosmo, Salon) just rely on the judgement of the editors whether an article is up to snuff or not. That's why you can't count on them for solid, scientific scholarship. --University of Texas at Austin

Databases Containing Peer-Reviewed Journal Articles

Each database containing peer-reviewed journals has different content coverage and materials.  The databases listed in this Research Guide are available only to Truckee Meadows Community College students, faculty and staff. You will need your TMCC credentials (Username and Password) to access them off-campus.

When searching a database, a search term frequently will retrieve many articles.  Browse the article abstracts to find one or more relevant to your search.

Some of the databases provide citations for the articles.

Consult a librarian for assistance.

  • Databases with peer-reviewed articles and content . This list can also be sorted by subject!

peer reviewed journal articles on research methods

How to Read a Peer-Reviewed Journal Article

Tips for Reading a Research Article

Read the Abstract. It consists of a brief summary of the research questions and methods. It may also state the findings. Because it is short and often written in dense psychological language, you may need to read it a couple of times. Try to restate the abstract in your own nontechnical language.

  • Read the Introduction. This is the beginning of the article, appearing first after the Abstract. This contains information about the authors' interest in the research, why they chose the topic, their hypothesis , and methods. This part also sets out the operational definitions of variables.
  • Read the Discussion section. Skip over the Methods section for the time being. The Discussion section will explain the main findings in great detail and discuss any methodological problems or flaws that the researchers discovered.
  • Read the Methods section. Now that you know the results and what the researchers claim the results mean, you are prepared to read about the Methods. This section explains the type of research and the techniques and assessment instruments used. If the research utilized self-reports and questionnaires, the questions and statements used may be set out either in this section or in an appendix that appears at the end of the report.
  • Read the Results section. This is the most technically challenging part of a research report. But you already know the findings (from reading about them in the Discussion section). This section explains the statistical analyses that led the authors to their conclusions.
  • Read the Conclusion. The last section of the report (before any appendices) summarizes the findings, but, more important for social research, it sets out what the researchers think is the value of their research for real-life application and for public policy. This section often contains suggestions for future research, including issues that the researchers became aware of in the course of the study.
  • Following the conclusions are appendices, usually tables of findings, presentations of questions and statements used in self-reports and questionnaires, and examples of forms used (such as forms for behavioral assessments).

Modified from Net Lab

  • << Previous: Databases Presenting Two Sides of an Argument
  • Next: Writing Style Guides and Citing Sources >>
  • Last Updated: Apr 15, 2024 9:36 AM
  • URL: https://libguides.tmcc.edu/researchmethods

site header image

Sociology 476: Methods of Social Research: Peer-reviewed Journals

  • Peer-reviewed Journals
  • Finding Articles
  • Search Tips
  • Borrowing from Other Libraries

Finding Peer-reviewed Journals

How do I know if a journal is peer-reviewed?

Some of the Library's article databases, such as SocINDEX with Full Text, allow you to limit to peer-reviewed journals when searching. If you cannot limit to peer-reviewed journals, use the Ulrichsweb database to look up the title of the journal. Ulrichsweb will show whether a journal is "refereed," which is another word for peer-reviewed.

If your article is in a print journal, information on the journal's use of "peer review" is often found in the beginning pages of the journal, or in the back pages. Sometimes it may be referred to as a "blind review" or an "anonymous review" process.

Remember, not all articles in peer-reviewed journals go through the peer-review process . Once you find an article in a peer-reviewed journal, you must then determine if the article is scholarly . If the article is scholarly, and it is in a peer-reviewed journal, then that article was peer-reviewed.

Need help deciding if an article is scholarly? See the Scholarly v. Popular tab in this guide.

Scholarly, Trade, & Popular Articles

Identifying scholarly articles involves analysis of the article's content. The chart below is meant to help you in this process; however, any one criteria by itself may not indicate that an article is scholarly. For example, a 30 page photo spread in People about stars at the Academy Awards is not scholarly, even though it is long.

  • << Previous: Home
  • Next: Finding Articles >>
  • Last Updated: May 20, 2024 6:06 PM
  • URL: https://libguides.uww.edu/soc476

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Cardiovascular health and cancer risk associated with plant based diets: An umbrella review

Roles Conceptualization, Data curation, Formal analysis, Writing – original draft

Affiliations Department of Biomedical and Neuromotor Science, Alma Mater Studiorum–University of Bologna, Bologna, Italy, Interdisciplinary Research Center for Health Science, Sant’Anna School of Advanced Studies, Pisa, Tuscany, Italy

ORCID logo

Roles Conceptualization, Formal analysis, Writing – review & editing

Affiliation Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom

Roles Conceptualization, Methodology, Supervision, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Biomedical and Neuromotor Science, Alma Mater Studiorum–University of Bologna, Bologna, Italy

Roles Conceptualization, Supervision, Writing – review & editing

Affiliation Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, CA, United States of America

Affiliation Department of Translational Medicine, University of Eastern Piedmont, (UNIUPO), Novara, Italy

Roles Conceptualization, Data curation, Writing – review & editing

Roles Conceptualization, Methodology, Supervision, Writing – review & editing

Affiliation IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma Neurochirurgia Ipofisi—Pituitary Unit, Bologna, Italy

  • Angelo Capodici, 
  • Gabriele Mocciaro, 
  • Davide Gori, 
  • Matthew J. Landry, 
  • Alice Masini, 
  • Francesco Sanmarchi, 
  • Matteo Fiore, 
  • Angela Andrea Coa, 
  • Gisele Castagna, 

PLOS

  • Published: May 15, 2024
  • https://doi.org/10.1371/journal.pone.0300711
  • Reader Comments

Table 1

Cardiovascular diseases (CVDs) and cancer are the two main leading causes of death and disability worldwide. Suboptimal diet, poor in vegetables, fruits, legumes and whole grain, and rich in processed and red meat, refined grains, and added sugars, is a primary modifiable risk factor. Based on health, economic and ethical concerns, plant-based diets have progressively widespread worldwide.

This umbrella review aims at assessing the impact of animal-free and animal-products-free diets (A/APFDs) on the risk factors associated with the development of cardiometabolic diseases, cancer and their related mortalities.

Data sources

PubMed and Scopus were searched for reviews, systematic reviews, and meta-analyses published from 1st January 2000 to 31st June 2023, written in English and involving human subjects of all ages. Primary studies and reviews/meta-analyses based on interventional trials which used A/APFDs as a therapy for people with metabolic diseases were excluded.

Data extraction

The umbrella review approach was applied for data extraction and analysis. The revised AMSTAR-R 11-item tool was applied to assess the quality of reviews/meta-analyses.

Overall, vegetarian and vegan diets are significantly associated with better lipid profile, glycemic control, body weight/BMI, inflammation, and lower risk of ischemic heart disease and cancer. Vegetarian diet is also associated with lower mortality from CVDs. On the other hand, no difference in the risk of developing gestational diabetes and hypertension were reported in pregnant women following vegetarian diets. Study quality was average. A key limitation is represented by the high heterogeneity of the study population in terms of sample size, demography, geographical origin, dietary patterns, and other lifestyle confounders.

Conclusions

Plant-based diets appear beneficial in reducing cardiometabolic risk factors, as well as CVDs, cancer risk and mortality. However, caution should be paid before broadly suggesting the adoption of A/AFPDs since the strength-of-evidence of study results is significantly limited by the large study heterogeneity alongside the potential risks associated with potentially restrictive regimens.

Citation: Capodici A, Mocciaro G, Gori D, Landry MJ, Masini A, Sanmarchi F, et al. (2024) Cardiovascular health and cancer risk associated with plant based diets: An umbrella review. PLoS ONE 19(5): e0300711. https://doi.org/10.1371/journal.pone.0300711

Editor: Melissa Orlandin Premaor, Federal University of Minas Gerais: Universidade Federal de Minas Gerais, BRAZIL

Received: January 8, 2024; Accepted: March 4, 2024; Published: May 15, 2024

Copyright: © 2024 Capodici et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Cardiovascular diseases (CVDs) and cancer currently represent the leading causes of death and disability worldwide. Studies performed on large cohorts worldwide have identified several modifiable and non-modifiable risk factors. Among them, robust evidence supports diet as a major modifiable risk factor [ 1 ].

A suboptimal diet, marked by insufficient consumption of fruits, vegetables, legumes, and whole grains, coupled with an excessive intake of meat (particularly red and processed), salt, refined grains and sugar, has been shown to notably elevate both mortality rates and disability-adjusted life years. Over time, these dietary choices have led to a concerning increase in health-related issues [ 1 , 2 ].

Additionally, the reduction of products of animal origin in favor of vegetarian ones has been suggested to reduce CVD and cancer risk [ 3 , 4 ]. Several major professional and scientific organizations encourage the adoption of vegetarian and vegan diets for the prevention and treatment of a range of chronic metabolic diseases such as atherosclerosis, type 2 diabetes, hypertension and obesity [ 5 , 6 ]. Ethical, environmental, and socio-economic concerns have contributed to the widespread growth of plant-based diets, particularly vegetarian and vegan options [ 7 – 9 ]. 2014 cross-national governmental survey estimated that approximately 75 million people around the globe deliberately followed a vegetarian diet, while an additional 1,45 million were obliged to because of socio-economic factors [ 10 , 11 ].

At the same time, study heterogeneity in terms of plant-based dietary regimens (from limitation of certain types to the total exclusion of animal products), their association with other lifestyle factors, patient demographic and geographical features, associated diseases, as well as study design and duration, significantly limit the assessment of the real benefits associated with animal-free and animal-products-free diets (A/APFDs). Finally, an increasing number of studies have highlighted the potential threatening consequences of chronic vitamin and mineral deficiencies induced by these diets (e.g., megaloblastic anemia due to vitamin B12 deficiency), especially more restrictive ones and in critical periods of life, like pregnancy and early childhood [ 5 ].

Based on these premises, our umbrella review aims at assessing the impact of animal-free and animal-products-free diets (A/APFDs) on the risk factors associated with the development of cardiometabolic diseases, cancer and their related mortalities in both the adult and the pediatric population, as well as pregnant women.

Search strategy

PubMed ( https://pubmed.ncbi.nlm.nih.gov/ ) and Scopus ( https://www.scopus.com/search/form.uri?display=basic#basic ) databases were searched for reviews, systematic reviews and meta-analyses published from 1st January 2000 to 31st June 2023. We considered only articles written in English, involving human subjects, with an available abstract, and answering to the following PICO question: P (population): people of all ages; I (intervention) and C (comparison): people adopting A/APFDs vs. omnivores; O (outcome): impact of A/APFD on health parameters associated with CVDs, metabolic disorders or cancer.

Articles not specifying the type of A/APFD regimen were excluded. If not detailed, the A/APFDs adopted by study participants was defined as “mixed diet”. Vegetarian diets limiting but not completely excluding certain types of meat/fish (i.e. pesco- or pollo-vegetarian diet) were excluded. Studies focusing on subjects with specific nutritional needs (i.e., athletes or military personnel) -except pregnant women-, or with known underlying chronic diseases (i.e., chronic kidney disease), as well as articles focusing on conditions/health parameters related to disorders different from CVDs or cancer, and, finally, reviews/meta-analyses including interventional studies assessing A/APFDs comparing it with pharmacological interventions were excluded.

Ad hoc literature search strings, made of a broad selection of terms related to A/APFDs, including PubMed MeSH-terms, free-text words and their combinations, combined by proper Boolean operators, were created to search PubMed database: ((vegetari* OR vegan OR Diet , Vegetarian[MH] OR fruitar* OR veganism OR raw-food* OR lacto-veget* OR ovo-vege* OR semi-veget* OR plant-based diet* OR vegetable-based diet* OR fruit-based diet* OR root-based diet OR juice-based diet OR non-meat eate* OR non-meat diet*) AND ((review[Publication Type]) OR (meta-analysis[Publication Type]))) AND (("2000/01/01"[Date—Publication] : "2023/06/31"[Date—Publication])) and Scopus database: ALL(vegetari* OR vegan OR Diet , Vegetarian OR fruitar* OR veganism OR raw-food* OR lacto-veget* OR ovo-vege* OR semi-veget* OR plant-based diet* OR vegetable-based diet* OR fruit-based diet* OR root-based diet OR juice-based diet OR non-meat eate* OR non-meat diet) AND SUBJAREA(MEDI OR NURS OR VETE OR DENT OR HEAL OR MULT) PUBYEAR > 1999 AND (LIMIT-TO (DOCTYPE , "re"))

Research design and study classification

An umbrella review approach [ 12 ] was applied to systematically assess the effect of A/APFDs on risk factors related to CVDs, metabolic disorders and cancer as derived from literature reviews, systematic reviews and meta-analyses ( Table 1 ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0300711.t001

Study selection

The list of articles identified by literature search was split into 5 equivalent parts, each assigned to a couple of readers (AC, DG, CW, ML, AM, FS, MF, AAC, GC and FG), who independently and blindly read the title and then the abstract of each article to define its pertinence. Papers included in the umbrella review had to focus on one/some of the following A/APFDs: vegans, lacto-vegetarians, ovo-vegetarians, lacto-ovo-vegetarians. No restriction was applied for age, gender, ethnicity, geographical origin, nor socio economic status. Primary studies, reviews/meta-analyses not written in English, or focusing on non-previously mentioned dietary regimens (including the Mediterranean diet) were excluded. Abstract meetings, editorials, letters to the editor, and study protocols were also excluded. To reduce study heterogeneity, at least in terms of dietary regimens, we excluded studies based on vegetarian regimens limiting but not avoiding fish or poultry, and prospective trials directly comparing A/AFPDs to pharmacological interventions.

In case of discordance between readers, we resorted to discussion amongst the authors to resolve it, based on the article’s abstract or, if not decisive, the full text. The study selection process is summarized in Fig 1 .

thumbnail

https://doi.org/10.1371/journal.pone.0300711.g001

This review was registered on PROSPERO (Record ID: 372913 https://www.crd.york.ac.uk /prospero/display_record.php?RecordID=372913 ).

Quality literature analysis

Three raters (AC, DG, FS) independently and blindly assessed the quality of the systematic reviews and meta-analyses using the revised AMSTAR-R 11-item tool, developed by the PEROSH group [ 13 ]. In case of disagreement, the score of each item and the final decision were discussed among the three raters.

Data extraction and reporting

Ten investigators (AC, DG, GM, ML, AM, FS, MF, AAC, GC, FG) independently extracted data from eligible articles. Disagreements in data extraction were resolved by consensus. Using a predefined protocol and a Microsoft Excel sheet, the following data were extracted: first author’s affiliation country; type of review; type of diet; target population; number of aggregated participants; total cholesterol; HDL-cholesterol; LDL-cholesterol; triglycerides; apolipoprotein B; C-Reactive Protein (CRP); Body Mass Index (BMI); body weight; fasting glucose; glycosylated hemoglobin (HbA1c); systolic blood pressure; diastolic blood pressure; cardiac events (type; risk); cardiovascular diseases (type; risk); gestational diabetes; gestational hypertension; cancer (type; risk); death due to CVDs/cancer (risk). Data were reported as mean difference (MD), weighted mean difference (WMD), standardized mean difference (SMD), and 95%CI, while the estimated risk could be reported as relative risk (RR), odds ratio (OR), or hazard ratio (HR), according to the data reported by the study authors. Articles assessing the risk of gestational diabetes and hypertension, as well as risk of low birth weight, and their determinants were examined separately.

Results from studies focusing on both vegetarian and vegan diets were analyzed and reported separately if authors had stratified the results according to the type of diet. On the contrary, if data from vegan and vegetarian subjects were mixed, we arbitrarily considered all of them as “vegetarian”.

Group 1: Cardiovascular endpoints and risk factors

I. total cholesterol (tc)..

Eight studies examined the levels of total serum cholesterol (TC) in vegetarians. Two focused on the general population and included 5,561 [ 14 ] and 576 [ 15 ] respectively, and, based on data meta-analysis, found a significant reduction in TC among vegetarians and people who assumed plant-based proteins (MD: -1.56 mmol/L; 95%CI: −1.73, −1.39; and -0.11 mmol/L; 95%CI: −0.22, −0.01, respectively).

Data were confirmed by Wang et al. (N = 832 total; Ovolacto/lacto-vegetarians: 291) [ 16 ], showing a greater dietary effect in subjects with a BMI ranging from 18.5 to 25 kg/m 2 (mean TC reduction: −0.94 mmol/L; 95%CI: −1.33, −0.55), and from 25 to 30 kg/m 2 (−0.58 mmol/L; 95%CI: −0.89, −0.27), than in those with a BMI >30 kg/m 2 (−0.16 mmol/L; 95%CI: −0.30, −0.01), and by Xu et al. (N = 783) [ 17 ], reporting lower TC in overweight and obese people (WMD: −0.37 mmol/L; 95%CI: −0.52, −0.22) adopting a vegetarian diet.

Another systematic review by Elliott et al., including 27 randomized controlled trials on plant based vs. normal western diets [ 18 ], found lower TC levels in vegetarians. These results were in line with other two descriptive reviews, the first including 2,890 overweight/obese adults [ 19 ], the second 8,969 vegetarian children aged 0–18 years [ 20 ]. Furthermore, a meta-analysis by Liang et al. described significantly lower TC (from -0.36 to -0.24 mmol/L) in people adopting plant based diets vs. people adopting western habitual diets [ 21 ].

Moreover, the review and meta-analysis by Dinu et al. [ 14 ], based on 19 studies for a total of 1,272 adults, reported significantly lower levels of TC among vegans than in omnivores (WMD: −1.72 mmol/L; 95%CI: −1.93, −1.51).

II. High-density lipoprotein cholesterol (HDL-C).

Eight reviews focused on the effects of vegetarian diet on serum high-density lipoprotein cholesterol (HDL-C) levels. Six [ 15 , 17 , 18 , 21 – 23 ] found no significant difference between vegetarians and omnivores, when considering normal weight and overweight/obese people. On the contrary, the study by Dinu et al. [ 14 ], based on 51 studies, for a total of 6,194 vegetarian adults, reported a WMD −0.15 mmol/L (95%CI: −0.19, −0.11). Liang et al. [ 21 ] analyzed 4 studies and reported a pooled estimated MD of −0.10 mmol/L (95%CI: −0.14, −0.05; p<0.001) in vegetarian diet adopters vs. western diets adopters. Finally, Zhang et al. [ 22 ] did not find any statistically significant differences in HDL-C levels when assessing vegetarian diets compared to non-vegetarians; on the same note Dinu et al. [ 14 ], analyzing data from 15 studies, for a total of 1,175 adults, found no significant differences in HDL-C levels between vegans and people following other dietary regimens.

III. Low-density lipoprotein cholesterol (LDL-C).

Ten reviews summarized the effect of vegetarian diets on serum levels of low-density lipoprotein cholesterol (LDL-C). Seven [ 14 – 18 , 21 , 23 ] found significantly lower LDL-C levels associated with vegetarian diet, both in the general population and in diabetic patients. In particular, Elliot et al. [ 18 ], analyzing 43 observational and interventional studies, described lower LDL-C in people adopting plant based diets; a significant difference was reported by the study of Liang et al. [ 21 ] based on 68 studies (MD: -0.29 to -0.17), and similar to data by Lamberg et al. [ 15 ], based on 13 RCTs including for a total of 576 participants (MD: -0.14 mmol/L; 95%CI: -0.25, -0.02). The impact of vegetarian diet appeared even greater in overweight or obese people, according to the analysis by Xu et al. [ 17 ], based on 7 RCTs (N = 783; MD: -0.31 mmol/L; 95%CI: -0.46, -0.16). Two reviews [ 19 , 20 ] reported similar results in overweight/obese patients and children aged 0–18 years, but no meta-analyses were conducted. Wang et al. [ 16 ] reported a MD of −0.34 mmol/L (95%CI: −0.57, −0.11; p<0.001) in the general adult population. Ferdowsian et al. [ 23 ] reported an overall reduction of LDL-C associated with vegetarian diet, but no synthesis analyses were performed. Dinu et al. [ 14 ] analyzed 46 studies encompassing 5,583 vegetarians and found a WMD of -1.18 mmol/L (95%CI: -1.34, -1.01). Finally, Viguiliouk et al. [ 24 ] reported a MD of −0.12 mmol/L (95%CI: −0.20, −0.04) in 6 trials involving 602 diabetic patients.

Four reviews identified a significant reduction in LDL-C in vegans as compared to omnivores [ 14 , 19 , 23 , 25 ]. Benatar et al. [ 25 ] analyzed 31 studies, for a total of 3,355 healthy vegan adults and 53,393 non-vegan controls and found MD of -0.49 mmol/L (95%CI: -0.62, -0.36; p<0.0001). Ferdowsian et al. [ 23 ] reported a reduction of LDL-C in healthy vegans, and Ivanova et al. [ 19 ] in overweight patients, but no meta-analysis was performed. Finally, Dinu et al. [ 14 ] analyzed 13 studies, for a total of 728 healthy vegan adults, and found a significant LDL-C reduction (WMD: −1.27 mmol/L; 95%CI: −1.66, −0.88).

IV. Triglycerides (TG).

Seven systematic reviews [ 14 , 16 – 18 , 20 , 23 , 26 ] analyzed serum triglycerides (TG) in vegetarians vs. omnivores. Specifically, Wang et al. [ 16 ] described no differences between the two, with a pooled estimated effect of 0.04 mmol/L (95%CI: −0.05, 0.13; p = 0.4). Zhang et al. [ 26 ] analyzing 12 studies for a total of 1,300 subjects, found a MD of −1.28 mmol/L (95%CI; −2.14, −0.42). Schürmann et al. and Ferdowsian et al. [ 20 , 23 ] reported lower TG in vegetarians in both children and adults but did not perform data meta-analysis. Dinu et al. [ 14 ] analyzed 55 studies including 4,008 vegetarians and found a WMD of −0.63 mmol/L (95%CI: −0.97, −0.30; p = 0.02). Conversely, in the review by Elliott et al. [ 18 ] no differences were reported in triglycerides. Xu et al. [ 17 ] reported a significant increase of TG (WMD: 0.29 mmol/L; 95%CI: 0.11, 0.47) in vegetarians as compared to meat eaters.

The effect of vegan diet on TG remains debated as one review [ 23 ] reported significative changes in TGs (-0.14 mmol/L, CI -0.24 to -0.05), while another [ 14 ] did not find any differences between vegans and omnivores since, after having analyzed 13 studies for 483 vegans, they reported a WMD of -0.52 mmol/L (95%CI: -1.13; 0.09).

V. C-reactive protein (CRP).

Three studies reported lower C-reactive protein (CRP) levels in normal weight, overweight and obese vegetarians as compared to non-vegetarians. Craddock et al. and Menzel et al. reported a WMD of -0.61 mg/L (95%CI: -0.91, -0.32; p = 0.0001) [ 27 ]; -0.25 mg/L (95%CI: -0.49, 0; p = 0.05) [ 28 ], respectively.

Data derived from the analysis by Menzel et al. [ 28 ] in vegan subjects were in line with previously mentioned studies performed in vegetarians (WMD: -0.54 mg/L; 95%CI: -0.79, -0.28; p<0.0001).

Two reviews [ 29 , 30 ] focused on the effects of mixed vegetarian diets on CRP levels. The first [ 29 ] included 2,689 obese patients and found a WMD of -0.55 mg/L (95%CI: -0.78, -0.32; I 2 = 94.4%), while the other [ 30 ], based on 2,398 normal weight subjects found no significant differences between vegetarians and omnivores in the primary analysis; alas, when considering a minimum duration of two years vegetarianism they described lower CRP levels vs. omnivores (Hedges’ g = -0.29; 95%CI: -0.59, 0.01).

VI. Plant-based diets and lipids.

Three studies [ 23 , 26 , 31 ] assessed the lipid profile in people following plant-based diets (without differentiating among diet subtypes) in comparison with omnivores. All of them found significantly lower levels of TC, HDL-C and LDL-C in subjects following plant-based diets. Specifically, Yokoyama et al. [ 31 ] reported a WMD of −1.62 mmol/L (95%CI: −1.92, −1.32; p< 0.001; I 2 = 81.4) for TC, −1.27 mmol/L (95%CI: −1.55, −0.99; p< 0.001; I 2 = 83.3) for LDL-C, −0.2 mmol/L (95%CI: −0.26, −0.14; p< 0.001; I 2 = 49.7) for HDL-C, and −0.36 mmol/L; 95%CI: −0.78, 0.06; p = 0.092; I 2 = 83.0) for TG when considering observational studies, and of −0.69 mmol/L (95%CI: −0.99, −0.4; p<0.001; I 2 = 54.8) for TC, −0.69 mmol/L (95%CI: −0.98, −0.37; p<0.001; I 2 = 79.2) for LDL-C, −0.19 mmol/L (95%CI: −0.24, −0.14; p<0.001; I 2 = 8.5) for HDL-C, and a non-statistically significant increase of TG based on prospective cohort studies. Additionally, Zhang et al. [ 26 ] in their meta-analysis, including 1,300 subjects, found a SMD of -1.28 mmol/L in TG (95% CI -2.14 to -0.42).

Finally, Picasso et al. [ 32 ] did not find any differences in triglycerides for mixed vegetarian diets (MD: 0.04 mmol/L; 95%CI: -0.09, 0.28), but did find statistically significant differences in HDL-C (MD: -0.05 mmol/L; 95%CI: -0.07, -0.03).

VII. Blood pressure.

A . Systolic blood pressure (SBP) . Various studies found significantly lower mean levels of systolic blood pressure (SBP) levels in vegetarians compared to the general population [ 33 – 36 ]. Specifically, Gibbs et al. [ 33 ] reported a SMD of -5.47 mmHg (95%CI: -7.60, -3.34; p<0.00001) in ovo-lacto-vegetarians, as did Lee et al. [ 34 ] reporting a SMD of -1.75 mmHg (95%CI: -5.38, 1.88; p = 0.05); furthermore, they reported a SBP decreased by -2.66 mmHg (95%CI: -3.76, -1.55), in people adopting generic vegetarian diets. Moreover, Garbett et al. [ 35 ] reported a 33% lower prevalence of hypertension in vegetarians vs. nonvegetarians. On the contrary, Schwingshackl et al. [ 36 ], analyzing data from 67 clinical trials overall including 17,230 pre-hypertensive and hypertensive adult patients with a BMI between 23.6 and 45.4 kg/m 2 , followed for 3 to 48 months, did not find any significant reductions in SBP associated with vegetarian diet.

Four reviews investigated the differences in SBP between vegans and non-vegans. Benatar et al. and Lee et al. [ 25 , 34 ] reported significantly lower mean SBP levels in vegans vs. omnivores (MD: -2.56 mmHg; 95%CI: -4.66, -0.45; and WMD: -3.12 mmHg; 95%CI: -4.54, -1.70; p<0.001, respectively). On the other hand, Gibbs et al. [-1.30 mmHg (95%CI: -3.90,1.29)] and Lopez et al. (-1.33 mmHg; 95%CI: −3.50, 0.84; P = 0.230) [ 33 , 37 ] did not find any significant difference in mean SBP levels between vegans and omnivores.

Both reviews [ 32 , 38 ] focusing on SBP in mixed-plant-based dietary patterns found significantly lower levels in vegetarians than in omnivores. The meta-analysis by Picasso et al. [ 32 ], based on 4 RCTs did not find any differences, alas, analyzing 42 cross sectional studies, they described a MD of -4.18 mmHg (95%CI -5.57, -2.80; p<0.00001), in agreement with Yokoyama et al. [ 38 ], who reported a MD of -4.8 mmHg (95%CI: -6.6, -3.1; p<0.001; I 2 = 0) according to the 7 controlled trials, 6 of which being randomized (311 participants), included in the analysis, and of -6.9 mmHg (95%CI: -9.1, -4.7; p<0.001; I 2 = 91.4) based on the other 32 observational studies (21,604 participants).

B . Diastolic blood pressure (DBP) . Garbett et al. [ 35 ] reported reduced mean diastolic blood pressure (DBP) values in vegetarians vs. omnivores, confirmed by the analysis of Gibbs et al. [ 33 ] (WMD: –2.49 mmHg; 95%CI: –4.17, –0.80; p = 0.004; I 2 = 0%) in ovo-lacto-vegetarians, by Lee et al. [ 34 ] [WMD: -1.69 mmHg (95%CI: -2.97, -0.41; p<0.001)] who included 15 randomized controlled trials (N = 856) performed in vegetarians; and by Yokoyama et al. [ 38 ], who highlighted a MD -2.2 mmHg (95%CI: -3.5, -1.0; p<0.001; I 2 = 0%) and -4.7 mmHg (95%CI: -6.3, -3.1; p<0.001; I 2 = 92.6%) according to data from 7 controlled trials (N = 311) and 32 observational studies (N = 21,604), respectively. Conversely, Schwingshackl et al. [ 36 ] did not find significant differences between vegetarians and non-vegetarians.

Three reviews [ 25 , 34 , 37 ] examined the impact of vegan vs. non-vegan diet on DBP and described statistically significant reductions. Benatar et al. described reduction of DBP, corresponding to a MD of -1.33 mmHg (95%CI: -2.67, -0.02) [ 25 ]. Lee et al. described a reduction in DBP of a WMD of -1.92 mmHg (95%CI: -3.18, -0.66; p<0.001) [ 34 ]. Finally, Lopez et al. [ 37 ] described the same reduction amounting to WMD: -4.10 mmHg (95%CI: -8.14, -0.06).

Four studies agreed upon the lower mean DBP levels in subjects following mixed vegetarian diets as compared to omnivores [ 32 – 34 , 38 ], quantified as MD -3.03 mmHg (95%CI: -4.93, 1.13; p = 0.002) by Picasso et al. [ 32 ], and −2.2 mmHg (95%CI: −3.5, −1.0; p<0.001) and −4.7 mmHg (95%CI: −6.3, −3.1; p <0.001) by the analysis performed on clinical trials and observational studies, respectively, by Yokoyama et al. [ 38 ].

VIII. Body weight and body mass index (BMI).

Berkow et al. [ 39 ] identified 40 observational studies comparing weight status of vegetarians vs. non-vegetarians: 29 reported that weight/BMI of vegetarians of both genders, different ethnicities (i.e., African Americans, Nigerians, Caucasians and Asians), and from widely separated geographic areas, was significantly lower than that of non-vegetarians, while the other 11 did not find significant differences between the two groups. In female vegetarians, weight was 2.9 to 10.6 kg (6% to 17%) and BMI 2.7% to 15.0% lower than female non-vegetarians, while the weight of male vegetarians was 4.6 to 12.6 kg (8% to 17%) lower and the BMI 4.6% to 16.3% lower than that of male non-vegetarians. The review by Schürmann et al. [ 20 ], focusing on 8,969 children aged 0–18 years old found similar body weight in both vegetarian and vegan children as compared to omnivore ones. Dinu et al. [ 14 ] analyzed data from 71 studies (including 57,724 vegetarians and 199,230 omnivores) and identified a WMD BMI of -1.49 kg/m 2 (95%CI: -1,72, -1,25; p<0.0001) in vegetarians when compared to omnivores.

Barnard et al. [ 40 ] found a significant reduction in weight in pure ovolactovegetarians (−2.9 kg; 95% CI −4.1 to −1.6; P<0.0001), compared to non-vegetarians from control groups; furthermore, they found in vegans the mean effect was of -3.2 kg (95% CI: -4.0;-2.4, P: <0.0001); overall they included 490 subjects in their analysis, excluding subjects who did not complete the trials.

Benatar et al. [ 25 ]–including 12,619 vegans and 179,630 omnivores from 40 observation studies–and Dinu et al. [ 14 ]–based on 19 cross sectional studies, for a total of 8,376 vegans and 123,292 omnivores–reported the same exact result, with a mean lower BMI in vegans vs omnivores, equal to -1.72 kg/m 2 (95%CI: -2.30, -1.16) and -1.72 kg/m 2 (95%CI: -2.21,-1.22; p<0.0001), respectively. The meta-analysis by Long et al. [ 41 ], performed on 27 studies, reported a MD of -0.70 kg/m 2 (95%CI: -1.38, -0.01) for BMI in vegans vs. omnivores. A systematic review and meta-analysis by Agnoli et al. [ 42 ] found mean BMI to be lower in subjects adhering to mixed vegetarian diets as compared to omnivores. Additionally, Tran et al. [ 43 ] described weight reductions in clinically healthy patients, as well as in people who underwent vegetarian diets as a prescription, but no meta-analysis was performed.

Finally, Huang et al. [ 44 ] found significant differences in both vegans and vegetarians, who were found to have lost weight after having adopted the diet as a consequence of being assigned to the intervention group in their randomized studies. For vegetarians the WMD was -2.02 kg (95%CI: -2.80 to -1.23), when compared to mixed diets, and for vegans the WMD was -2.52 kg (95%CI: -3.02 to -1.98), when compared to vegetarians.

IX. Glucose metabolism.

Viguiliouk et al. [ 24 ] found a significant reduction in HbA1c (MD: −0.29%; 95%CI: −0.45, −0.12) and fasting glucose (MD: −0.56 mmol/L; 95%CI: −0.99, −0.13) in vegetarians vs. non-vegetarians.

The meta-analysis by Dinu et al. [ 14 ], reported for vegetarians (2256) vs omnivores (2192) WMD: -0.28 mmol/L (95%CI: -0.33, -0.23) in fasting blood glucose.

These findings were confirmed by Picasso et al. [ 32 ] who found a MD of -0.26 mmol/L (95% CI: -0.35, -0.17) in fasting glucose in mixed-vegetarian diets as compared to omnivores.

A meta-analysis by Long et al. [ 41 ], based of 27 cross sectional studies, showed a MD for homeostasis model assessment of insulin resistance -measured as HOMA-IR, a unitless measure ideally less than one- of -0.75 (95%CI: -1.08, -0.42), fasting plasma glucose in vegetarians who adhered also to an exercise intervention as compared to omnivores.

Lee & Park [ 45 ] reported a significantly lower diabetes risk (OR 0.73; 95%CI: 0.61, 0.87; p<0.001) in vegetarians vs. non-vegetarians, being the association stronger in studies conducted in the Western Pacific region and Europe/North America than in those from Southeast Asia.

Regarding vegans, the review by Benatar et al. [ 25 ] determined a mean reduction of 0.23 mmol/L (95%CI: -0.35, -0.10) of fasting blood glucose in vegans (N = 12,619) as compared to omnivores (N = 179,630). The finding was in line with Dinu et al. [ 14 ], who reported a WMD of -0.35 mmol/L (95%CI: -0.69, -0.02; p = 0.04) of fasting blood glucose in vegans (n = 83) than omnivores (n = 125).

A systematic review, finally, including 61 studies [ 42 ] found mean values of fasting plasma glucose, and T2D risk to be lower in subjects following mixed vegetarian diets as compared to omnivores.

X. Cardiovascular events.

Huang et al. [ 46 ] found a significantly lower risk of ischemic heart disease (IHD) (RR: 0.71; 95%CI: 0.56, 0.87), but no significant differences for cerebrovascular mortality between vegetarians and non-vegetarians. The review by Remde et al. [ 47 ] was not conclusive, as only a few studies showed a reduction of the risk of CVDs for vegetarians versus omnivores, while the others did not find any significant results.

Dybvik et al. [ 48 ] based on 13 cohort studies for a total of 844,175 participants (115,392 with CVDs, 30,377 with IHD and 14,419 with stroke) showed that the overall RR for vegetarians vs. nonvegetarians was 0.85 (95%CI: 0.79–0.92, I 2 = 68%; 8 studies) for CVD, 0.79 (95%CI: 0.71–0.88, I 2 = 67%; 8 studies) for IHD, 0.90 (95%CI: 0.77–1.05, I 2 = 61%; 12 studies) for total stroke, while the RR of IHD in vegans vs. omnivores was 0.82 (95%CI: 0.68–1.00, I 2 = 0%; 6 studies).

The meta-analysis by Kwok et al. [ 49 ], based on 8 studies including 183,321 subjects comparing vegetarians versus non-vegetarians. They identified a significant reduction of IHD in the Seventh Day Adventist (SDA) cohort, who primarily follow ovo-lacto-vegetarian diets, while other non-SDA vegetarian diets were associated only with a modest reduction of IHD risk, raising the concern that other lifestyle factors typical of SDA and, thus not generalizable to other groups, play a primary role on outcomes. IHD was significantly reduced in both genders (RR: 0.60; 95%CI: 0.43, 0.83), while the risk of death and cerebrovascular disease and cardiovascular mortality risk reduction was significantly reduced only in men. No significant differences were detected for the risk of cerebrovascular events.

The meta-analysis by Lu et al. [ 50 ] -657,433 participants from cohort studies- reported a lower incidence of total stroke among vegetarians vs. nonvegetarians (HR = 0.66; 95%CI = 0.45–0.95; I 2 = 54%), while no differences were identified for incident stroke.

The descriptive systematic review by Babalola et al. [ 3 ] reported that adherence to a plant-based diet was inversely related to heart failure risk and advantageous for the secondary prevention of CHD, particularly if started from adolescence. Another review by Agnoli et al. [ 42 ], confirmed a lower incidence of CVDs associated with mixed vegetarian diets as compared to omnivorous diets. Finally, Chhabra et al. [ 51 ] found that vegetarian diet, particularly if started in adolescence and associated with vitamin B intake, can reduce the risk of stroke.

Gan et al. [ 52 ] described a lower risk of CVDs (RR 0.84; 95% CI 0.79 to 0.89; p < 0.05) in high, vs. low, adherence plant based diets, but the same association was not confirmed for stroke (RR 0.87; 95% CI: 0.73, 1.03).

Group 2: Pregnancy outcomes

The meta-analysis by Foster et al. [ 53 ], performed on 6 observational studies, found significantly lower zinc levels in vegetarians than in meat eaters (-1.53 ± 0.44 mg/day; p = 0.001), but no association with pregnancy outcomes, specifically no increase in low children birth weight. The finding was confirmed by Tan et al. [ 54 ], who similarly reported no specific risks, but reported that Asian (India/Nepal) vegetarian mothers exhibited increased risks to deliver a baby with Low Birth Weight (RR: 1.33 [95%CI:1.01, 1.76, p =  0.04, I 2 = 0%]; nonetheless, the WMD of neonatal birth weight in five studies they analyzed suggested no difference between vegetarians and omnivores.

To our knowledge, no reviews/meta-analyses have assessed the risk of zinc deficiency and its association with functional outcomes in pregnancy in relation to mixed or vegan diets.

Group 3: Cancer

The meta-analysis by Parra-Soto et al. [ 55 ], based on 409,110 participants from the UK Biobank study (mean follow-up 10.6 years), found a lower risk of liver, pancreatic, lung, prostate, bladder, colorectal, melanoma, kidney, non-Hodgkin lymphoma and lymphatic cancer as well as overall cancer (HR ranging from 0.29 to 0.70) determined by non-adjusted models in vegetarians vs. omnivores; when adjusted for sociodemographic and lifestyle factors, multimorbidity and BMI, the associations remained statistically significant only for prostate cancer (HR 0.57; 95%CI: 0.43, 0.76), colorectal cancer (HR 0.73; 95%CI: 0.54, 0.99), and all cancers combined (HR 0.87; 95%CI 0.79, 0.96). When colorectal cancer was stratified according to subtypes, a lower risk was observed for colon (HR 0.69; 95%CI: 0.48, 0.99) and proximal colon (HR 0.43; 95%CI: 0.22, 0.82), but not for rectal or distal cancer.

Similarly, the analysis by Huang et al. [ 46 ], based on 7 studies for a total of 124,706 subjects, reported a significantly lower overall/total cancer incidence in vegetarians than non-vegetarians (RR 0.82; 95%CI: 0.67, 0.97).

Zhao et al. [ 56 ] found a lower risk of digestive system cancer in plant-based dieters (RR = 0.82, 95%CI: 0.78–0.86; p< 0.001) and in vegans (RR: 0.80; 95%CI: 0.74, 0.86; p<0.001) as compared to meat eaters.

Additionally, DeClercq et al. [ 57 ] reported a decreased risk of overall cancer and colorectal cancer, but inconsistent results for prostate cancer and breast cancer; this was substantiated by Godos et al. [ 58 ] found no significant differences in breast, colorectal, and prostate cancer risk between vegetarians and non-vegetarians.

The umbrella review by Gianfredi et al. [ 59 ], did describe a lower risk of pancreatic cancer associated with vegetarian diets.

Dinu et al. [ 14 ] reported a reduction in the risk of total cancer of 8% in vegetarians, and of 15% in vegans, as compared to omnivores. They described lower risk of cancer among vegetarians (RR 0.92; 95%CI 0.87, 0.98) and vegans (RR: 0.85; 95%CI: 0.75,0.95); nonetheless, they also described non-significant reduced risk of mortality from colorectal, breast, lung and prostate cancers. Regarding the latter, a meta-analysis by Gupta et al. [ 60 ] on prostate cancer risk found a decreased hazard ratio for the incidence of prostate cancer (HR: 0.69; 95%CI: 0.54–0.89, P<0.001) in vegetarians as compared to omnivores from the evidence coming from 3 studies. In the vegan population, similar results were observed from the only included study (HR: 0.65; 95%CI: 0.49–0.85; p<0.001).

Group 4: Death by cardiometabolic diseases and cancer

According to Huang et al. [ 46 ], the mortality from IHD (RR: 0.71; 95%CI: 0.56, 0.87), circulatory diseases (RR: 0.84; 95%CI: 0.54, 1.14) and cerebrovascular diseases (RR: 0.88; 95%CI: 0.70, 1.06) was significantly lower in vegetarians than in non-vegetarians.

The analysis by Dinu et al. [ 14 ] performed on 7 prospective studies, overall including 65,058 vegetarians, reported a 25% reduced mortality risk from ischemic heart diseases (RR 0.75; 95%CI: 0.68, 0.82; p<0.001), but no significant differences were found analyzing 5 cohort studies in terms of mortality from CVDs, cerebrovascular diseases, nor colorectal, breast, prostate, and lung cancer. Regarding vegans, they analyzed 6 cohort studies, and found no differences in all-cause mortality, but significant differences in cancer incidence (RR: 0.85; 95%CI: 0.75, 0.95), indicating a protective effect of vegan diets.

The literature search did not identify studies focusing on mortality risk for cardiometabolic and cancer diseases in vegans.

Quality of the included studies

The quality of the 48 reviews and meta-analyses included in this umbrella review was assessed through the AMSTAR-R tool. Results are reported in S1 Table . Overall, the average quality score was 28, corresponding to mean quality. However, 36 studies (75%) scored between 60% and 90% of the maximum obtainable score, and can, therefore, be considered of good/very good quality. The least satisfied item on the R-AMSTAR grid was #8 -scientific quality of included studies used to draw conclusions-, where as many as 19 studies (39.6%) failed to indicate the use of study-related quality analysis to make recommendations. This finding should be read in conjunction with the missing quality analysis in 15 studies (31.3%)–Item #7 scientific quality of included studies assessed and documented-. Item #10, regarding publication bias, was the second least met item, in which 18 studies (37.5%) did not perform any analysis on this type of bias. 16 studies (33.3%) lacked to indicate careful exclusion of duplicates (Item #2), but also the presence of conflict of interest (Item #11). This point is certainly another important piece to consider in the overall quality assessment of these articles. All these considerations give us a picture of a general low quality of the publications found, lowering the strength of evidence as well as the external validity of the results.

This umbrella review provides an update on the benefits associated with the adoption of A/AFPDs in reducing risk factors associated with the development of cardiometabolic diseases and cancer, considering both the adult and the pediatric population, as well as pregnant women.

Compared to omnivorous regimens, vegetarian and vegan diets appear to significantly improve the metabolic profile through the reduction of total and LDL cholesterol [ 14 – 21 , 23 , 25 ], fasting blood glucose and HbA1c [ 14 , 24 , 25 , 37 , 39 – 41 ], and are associated with lower body weight/BMI, as well as reduced levels of inflammation (evaluated by serum CRP levels [ 27 , 30 ]), while the effect on HDL cholesterol and triglycerides, systolic and diastolic blood pressure levels remains debated. A much more limited body of literature suggested vegetarian, but not vegan diets also reduce ApoB levels further improving the lipid profile [ 61 ].

It should be remarked that, in the majority of the cases, people adopting plant-based diets are more prone to engage in healthy lifestyles that include regular physical activity, reduction/avoidance of sugar-sweetened beverages, alcohol and tobacco, that, in association with previously mentioned modification of diet [ 62 ], lead to the reduction of the risk of ischemic heart disease and related mortality, and, to a lesser extent, of other CVDs.

The adoption of vegan diets is known to increase the risk of vitamin B-12 deficiency and consequent disorders–for which appropriate supplementation was recommended by a 2016 position paper of the Academy of Nutrition and Dietetics’ [ 5 ], but, apparently, does not modify the risk of pregnancy-induced hypertension nor gestational diabetes mellitus [ 53 , 54 ].

The three meta-analyses [ 46 , 55 , 57 ] that analyzed the overall risk of cancer incidence in any form concordantly showed a reduction in risk in vegetarians compared to omnivores. These general results were inconsistent in the stratified analyses for cancer types, which as expected involved smaller numbers of events and wider confidence intervals, especially for less prevalent types of cancers.

The stratified analyses in the different reviews did not show any significant difference for bladder, melanoma, kidney, lymphoma, liver, lung, or breast cancer. Conversely the three meta-analyses that addressed colorectal cancer [ 55 , 57 , 58 ] showed a decrease in risk in two out of three with one not showing a significant difference in vegetarians versus omnivores for the generic colorectal tract.

Interestingly, one review [ 55 ] showed how analysis with even more specific granularity could reveal significant differences in particular subsets of cancers, e.g., distal, and proximal colon. Also, another recent review found significant results for pancreatic cancer [ 59 ].

Our umbrella review seems consistent with other primary evidence that links the consumption of red processed meats to an increased risk of cancers of the gastro-intestinal tract [ 63 ]. The association certainly has two faces, because while a potential risk of cancer given by increased red meat consumption can be observed, the potential protective factor given by increased fruit and vegetable consumption, shown by other previous evidence, must also be considered [ 64 ].

It has also been described that vegetarians, in addition to reduced meat intake, ate less refined grains, added fats, sweets, snacks foods, and caloric beverages than did nonvegetarians and had increased consumption of a wide variety of plant foods [ 65 ]. Such a dietary pattern seems responsible for a reduction of hyperinsulinemia, one of the possible factors for colorectal cancer risk related to diet and food intake [ 66 , 67 ]. In the same manner, some research has suggested that insulin-like growth factors and its binding proteins may relate to cancer risk [ 68 , 69 ]. This dietary pattern should not be regarded as a universal principle, as varying tendencies have been observed among vegetarians and vegans in different studies. This pattern of consumption may potentially negate the anticipated beneficial effects of their diets.

Also, some protective patterns can be attributed to the effects of bioactive compounds of plant foods, these being primary sources of fiber, carotenoids, vitamins, minerals, and other compounds that have been associated with anti-cancer properties [ 70 , 71 ]. The protective patterns are likely attributed to the mechanistic actions of the many bioactives found in plant foods such as fiber, carotenoids, vitamins, and minerals with plausible anti-cancer properties. These ranged from epigenetic mechanisms [ 72 ], to immunoregulation, antioxidant and anti-inflammatory activity [ 73 , 74 ].

Finally, increased adiposity could be another pathway by which food intake is associated with these types of cancers. Since our umbrella review has demonstrated that vegetarian diets are associated with lower BMI, this might be another concurrent factor in the decreased risk for pancreatic and colorectal cancers in vegetarians.

Inflammatory biomarkers and adiposity play pivotal roles in the genesis of prostate cancer [ 75 , 76 ], hence the same etiological pathways might be hypothesized even for the increase of this type of cancer in people adopting an omnivorous diet.

The study presents several noteworthy strengths in its methodological approach and thematic focus. It has employed a rigorous and comprehensive search strategy involving two major databases, PubMed, and Scopus, spanning over two decades of research from 1 st January 2000 to 31 st June 2023, thereby ensuring a robust and exhaustive collection of pertinent literature. By utilizing an umbrella review, the research enables the synthesis of existing systematic reviews and meta-analyses, providing a higher level of evidence and summarizing a vast quantity of information. Furthermore, its alignment with current health concerns, specifically targeting cardiovascular diseases and cancer, makes the study highly relevant to ongoing public health challenges and positions it as a valuable resource for informing preventive measures and dietary guidelines. The deployment of blinded and independent assessments by multiple raters and investigators fortifies the research by minimizing bias and reinforcing the reliability of the selection, quality assessment, and data extraction processes. Quality assessment is standardized using the revised AMSTAR-R 11-item tool, and transparency is fostered through registration on PROSPERO, thus enhancing the credibility of the study. Lastly, the study’s detailed analysis and reporting, particularly the extraction of specific health measures such as cholesterol levels, glucose levels, blood pressure, and cancer risks, contribute to the comprehensiveness of the data synthesis, thereby underlining the overall integrity and significance of the research.

Main limitations to data analysis and interpretation are intrinsic to the original studies and consist in the wide heterogeneity in terms of sample size, demographic features, and geographical origin of included subjects, dietary patterns–not only in terms of quality, but, even more important and often neglected, quantity, distribution during the day, processing, cooking methods–and adherence, and other lifestyle confounders. In this regard, it is worth to mention that the impact of diet per se on the development of complex disorders (i.e. CVDs and cancer) and related mortality is extremely difficult to assess [ 71 ], especially in large populations, characterized by a highly heterogeneous lifestyle. It should also be considered the heterogeneity in dietary and lifestyle habits among countries, according to which the adoption of A/AFPDs could modify significantly habits in some countries, but not in others, and consequently have an extremely different impact on the risk of developing cardiometabolic disorders and cancer [ 25 ]. Furthermore, due to the nature of umbrella reviews, the present work may not include novel associations which were excluded from the analyzed reviews, as the main aim was to summarize secondary studies, such as reviews and meta-analyses. Finally, studies assessing the benefit of A/AFPDs on cancer risk are also limited by the heterogeneity in the timing of oncological evaluation and, therefore, disease progression, as well as in the histological subtypes and previous/concomitant treatments [ 72 – 75 ].

In conclusion, this umbrella review offers valuable insights on the estimated reduction of risk factors for cardiometabolic diseases and cancer, and the CVDs-associated mortality, offered by the adoption of plant-based diets through pleiotropic mechanisms. Through the improvement of glycolipid profile, reduction of body weight/BMI, blood pressure, and systemic inflammation, A/AFPDs significantly reduce the risk of ischemic heart disease, gastrointestinal and prostate cancer, as well as related mortality.

However, data should be taken with caution because of the important methodological limitation associated with the original studies. Moreover, potential risks associated with insufficient intake of vitamin and other elements due to unbalanced and/or extremely restricted dietary regimens, together with specific patient needs should be considered, while promoting research on new and more specific markers (i.e. biochemical, genetic, epigenetic markers; microbiota profile) recently associated with cardiometabolic and cancer risk, before suggesting A/AFPDs on large scale.

Supporting information

S1 table. r-amstar..

https://doi.org/10.1371/journal.pone.0300711.s001

S2 Table. PRISMA 2020 checklist.

https://doi.org/10.1371/journal.pone.0300711.s002

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • Open access
  • Published: 16 May 2024

Promoting equality, diversity and inclusion in research and funding: reflections from a digital manufacturing research network

  • Oliver J. Fisher 1 ,
  • Debra Fearnshaw   ORCID: orcid.org/0000-0002-6498-9888 2 ,
  • Nicholas J. Watson 3 ,
  • Peter Green 4 ,
  • Fiona Charnley 5 ,
  • Duncan McFarlane 6 &
  • Sarah Sharples 2  

Research Integrity and Peer Review volume  9 , Article number:  5 ( 2024 ) Cite this article

201 Accesses

1 Altmetric

Metrics details

Equal, diverse, and inclusive teams lead to higher productivity, creativity, and greater problem-solving ability resulting in more impactful research. However, there is a gap between equality, diversity, and inclusion (EDI) research and practices to create an inclusive research culture. Research networks are vital to the research ecosystem, creating valuable opportunities for researchers to develop their partnerships with both academics and industrialists, progress their careers, and enable new areas of scientific discovery. A feature of a network is the provision of funding to support feasibility studies – an opportunity to develop new concepts or ideas, as well as to ‘fail fast’ in a supportive environment. The work of networks can address inequalities through equitable allocation of funding and proactive consideration of inclusion in all of their activities.

This study proposes a strategy to embed EDI within research network activities and funding review processes. This paper evaluates 21 planned mitigations introduced to address known inequalities within research events and how funding is awarded. EDI data were collected from researchers engaging in a digital manufacturing network activities and funding calls to measure the impact of the proposed method.

Quantitative analysis indicates that the network’s approach was successful in creating a more ethnically diverse network, engaging with early career researchers, and supporting researchers with care responsibilities. However, more work is required to create a gender balance across the network activities and ensure the representation of academics who declare a disability. Preliminary findings suggest the network’s anonymous funding review process has helped address inequalities in funding award rates for women and those with care responsibilities, more data are required to validate these observations and understand the impact of different interventions individually and in combination.

Conclusions

In summary, this study offers compelling evidence regarding the efficacy of a research network's approach in advancing EDI within research and funding. The network hopes that these findings will inform broader efforts to promote EDI in research and funding and that researchers, funders, and other stakeholders will be encouraged to adopt evidence-based strategies for advancing this important goal.

Peer Review reports

Introduction

Achieving equality, diversity, and inclusion (EDI) is an underpinning contributor to human rights, civilisation and society-wide responsibility [ 1 ]. Furthermore, promoting and embedding EDI within research environments is essential to make the advancements required to meet today’s research challenges [ 2 ]. This is evidenced by equal, diverse and inclusive teams leading to higher productivity, creativity and greater problem-solving ability [ 3 ], which increases the scientific impact of research outputs and researchers [ 4 ]. However, there remains a gap between EDI research and the everyday implementation of inclusive practices to achieve change [ 5 ]. This paper presents and reflects on the EDI measures trialled by the UK Engineering and Physical Sciences Research Council (EPSRC) funded digital manufacturing research network, Connected Everything (grant number: EP/S036113/1) [ 6 ]. The EPSRC is a UK research council that funds engineering and physical sciences research. By sharing these reflections, this work aims to contribute to the wider effort of creating an inclusive research culture. The perceptions of equality, diversity, and inclusion may vary among individuals. For the scope of this study, the following definitions are adopted:

Equality: Equality is about ensuring that every individual has an equal opportunity to make the most of their lives and talents. No one should have poorer life chances because of the way they were born, where they come from, what they believe, or whether they have a disability.

Diversity: Diversity concerns understanding that each individual is unique, recognising our differences, and exploring these differences in a safe, positive, and nurturing way to value each other as individuals.

Inclusion: Inclusion is an effort and practice in which groups or individuals with different backgrounds are culturally and socially accepted, welcomed and treated equally. This concerns treating each person as an individual, making them feel valued, and supported and being respectful of who they are.

Research networks have varied goals, but a common purpose is to create new interdisciplinary research communities, by fostering interactions between researchers and appropriate scientific, technological and industrial groups. These networks aim to offer valuable career progression opportunities for researchers, through access to research funding, forming academic and industrial collaborations at network events, personal and professional development, and research dissemination. However, feedback from a 2021 survey of 19 UK research networks, suggests that these research networks are not always diverse, and whilst on the face of it they seem inclusive, they are perceived as less inclusive by minority groups (including non-males, those with disabilities, and ethnic minority respondents) [ 7 ]. The exclusivity of these networks further exacerbates the inequality within the academic community as it prevents certain groups from being able to engage with all aspects of network activities.

Research investigating the causes of inequality and exclusivity has identified several suggestions to make research culture more inclusive, including improving diverse representation within event programmes and panels [ 8 , 9 ]; ensuring events are accessible to all [ 10 ]; providing personalised resources and training to build capacity and increase engagement [ 11 ]; educating institutions and funders to understand and address the barriers to research [ 12 ]; and increasing diversity in peer review and funding panels [ 13 ]. Universities, research institutions and research funding bodies are increasingly taking responsibility to ensure the health of the research and innovation system and to foster inclusion. For example, the EPSRC has set out their own ‘Expectation for EDI’ to promote the formation of a diverse and inclusive research culture [ 14 ]. To drive change, there is an emphasis on the importance of measuring diversity and links to measured outcomes to benchmark future studies on how interventions affect diversity [ 5 ]. Further, collecting and sharing EDI data can also drive aspirations, provide a target for actions, and allow institutions to consider common issues. However, there is a lack of available data regarding the impact of EDI practices on diversity that presents an obstacle, impeding the realisation of these benefits and hampering progress in addressing common issues and fostering diversity and inclusion [ 5 ].

Funding acquisition is important to an academic’s career progression, yet funding may often be awarded in ways that feel unequal and/or non-transparent. The importance of funding in academic career progression means that, if credit for obtaining funding is not recognised appropriately, careers can be damaged, and, as a result of the lack of recognition for those who have been involved in successful research, funding bodies may not have a complete picture of the research community, and are unable to deliver the best value for money [ 15 ]. Awarding funding is often a key research network activity and an area where networks can have a positive impact on the wider research community. It is therefore important that practices are established to embed EDI consideration within the funding process and to ensure that network funding is awarded without bias. Recommendations from the literature to make the funding award process fairer include: ensuring a diverse funding panel; funders instituting reviewer anti-bias training; anonymous review; and/or automatic adjustments to correct for known biases [ 16 ]. In the UK, the government organisation UK Research and Innovation (UKRI), tasked with overseeing research and innovation funding, has pledged to publish data to enhance transparency. This initiative aims to furnish an evidence base for designing interventions and evaluating their efficacy. While the data show some positive signs (e.g., the award rates for male and female PI applicants were equal at 29% in 2020–21), Ottoline Leyser (UKRI Chief Executive) highlights the ‘persistent pernicious disparities for under-represented groups in applying for and winning research funding’ [ 17 ]. This suggests that a more radical approach to rethinking the traditional funding review process may be required.

This paper describes the approach taken by the ‘Connected Everything’ EPSRC-funded Network to embed EDI in all aspects of its research funding process, and evaluates the impact of this ambition, leading to recommendations for embedding EDI in research funding allocation.

Connected everything’s equality diversity and inclusion strategy

Connected Everything aims to create a multidisciplinary community of researchers and industrialists to address key challenges associated with the future of digital manufacturing. The network is managed by an investigator team who are responsible for the strategic planning and, working with the network manager, to oversee the delivery of key activities. The network was first funded between 2016–2019 (grant number: EP/P001246/1) and was awarded a second grant (grant number: EP/S036113/1). The network activities are based around three goals: building partnerships, developing leadership and accelerating impact.

The Connected Everything network represents a broad range of disciplines, including manufacturing, computer science, cybersecurity, engineering, human factors, business, sociology, innovation and design. Some of the subject areas, such as Computer Science and Engineering, tend to be male-dominated (e.g., in 2021/22, a total of 185,42 higher education student enrolments in engineering & technology subjects was broken down as 20.5% Female and 79.5% Male [ 18 ]). The networks also face challenges in terms of accessibility for people with care responsibilities and disabilities. In 2019, Connected Everything committed to embedding EDI in all its network activities and published a guiding principle and goals for improving EDI (see Additional file 1 ). When designing the processes to deliver the second iteration of Connected Everything, the team identified several sources of potential bias/exclusion which have the potential to impact engagement with the network. Based on these identified factors, a series of mitigation interventions were implemented and are outlined in Table  1 .

Connected everything anonymous review process

A key Connected Everything activity is the funding of feasibility studies to enable cross-disciplinary, foresight, speculative and risky early-stage research, with a focus on low technology-readiness levels. Awards are made via a short, written application followed by a pitch to a multidisciplinary diverse panel including representatives from industry. Six- to twelve-month-long projects are funded to a maximum value of £60,000.

The current peer-review process used by funders may reveal the applicants’ identities to the reviewer. This can introduce dilemmas to the reviewer regarding (a) deciding whether to rely exclusively on information present within the application or search for additional information about the applicants and (b) whether or not to account for institutional prestige [ 34 ]. Knowing an applicant’s identity can bias the assessment of the proposal, but by focusing the assessment on the science rather than the researcher, equality is more frequently achieved between award rates (i.e., the proportion of successful applications) [ 15 ]. To progress Connected Everything’s commitment to EDI, the project team created a 2-stage review process, where the applicants’ identity was kept anonymous during the peer review stage. This anonymous process, which is outlined in Fig.  1 , was created for the feasibility study funding calls in 2019 and used for subsequent funding calls.

figure 1

Connected Everything’s anonymous review process [EDI: Equality, diversity, and inclusion]

To facilitate the anonymous review process, the proposal was submitted in two parts: part A the research idea and part B the capability-to-deliver statement. All proposals were first anonymously reviewed by a random selection of two members from the Connected Everything executive group, which is a diverse group of digital manufacturing experts and peers from academia, industry and research institutions that provide guidance and leadership on Connected Everything activities. The reviewers rated the proposals against the selection criteria (see Additional file 1 , Table 1) and provided overall comments alongside a recommendation on whether or not the applicant should be invited to the panel pitch. This information was summarised and shared with a moderation sift panel, made up of a minimum of two Connected Everything investigators and a minimum of one member of the executive group, that tensioned the reviewers’ comments (i.e. comments and evaluations provided by the peer reviewers are carefully considered and weighed against each other) and ultimately decided which proposals to invite to the panel. This tension process included using the identifying information to ensure the applicants did have the capability to deliver the project. If this remained unclear, the applicants were asked to confirm expertise in an area the moderation sift panel thought was key or asked to bring in additional expertise to the project team during the panel pitch.

During stage two the applicants were invited to pitch their research idea to a panel of experts who were selected to reflect the diversity of the community. The proposals, including applicants’ identities, were shared with the panel at least two weeks ahead of the panel. Individual panel members completed a summary sheet at the end of the pitch session to record how well the proposal met the selection criteria (see Additional file 1 , Table 1). Panel members did not discuss their funding decision until all the pitches had been completed. A panel chair oversaw the process but did not declare their opinion on a specific feasibility study unless the panel could not agree on an outcome. The panel and panel chair were reminded to consider ways to manage their unconscious bias during the selection process.

Due to the positive response received regarding the anonymous review process, Connected Everything extended its use when reviewing other funded activities. As these awards were for smaller grant values (~ £5,000), it was decided that no panel pitch was required, and the researcher’s identity was kept anonymous for the entire process.

Data collection and analysis methods

Data collection.

Equality, diversity and inclusion data were voluntarily collected from applicants for Connected Everything research funding and from participants who won scholarships to attend Connected Everything funded activities. Responses to the EDI data requests were collected from nine Connected Everything coordinated activities between 2019 and 2022. Data requests were sent after the applicant had applied for Connected Everything funding or had attended a Connected Everything funded activity. All data requests were completed voluntarily, with reassurance given that completion of the data requested in no way affected their application. In total 260 responses were received, of which the three feasibility study calls comprised 56.2% of the total responses received. Overall, there was a 73.8% response rate.

To understand the diversity of participants engaging with Connected Everything activities and funding, the data requests asked for details of specific diversity characteristics: gender, transgender, disability, ethnicity, age, and care responsibilities. Although sex and gender are terms that are often used interchangeably, they are two different concepts. To clarify, the definitions used by the UK government describe sex as a set of biological attributes that is generally limited to male or female, and typically attributed to individuals at birth. In contrast, gender identity is a social construction related to behaviours and attributes, and is self-determined based on a person’s internal perception, identification and experience. Transgender is a term used to describe people whose gender identity is not the same as the sex they were registered at birth. Respondents were first asked to identify their gender and then whether their gender was different from their birth sex.

For this study, respondents were asked to (voluntarily) self-declare whether they consider themselves to be disabled or not. Ethnicity within the data requests was based on the 2011 census classification system. When reporting ethnicity data, this study followed the AdvanceHE example to aggregate the census categories into six groups to enable benchmarking against the available academic ethnicity data. AdvanceHE is a UK charity that works to improve the higher education system for staff, students and society. However, it was acknowledged that there were limitations with this grouping, including the assumption that minority ethnic staff or students are a homogenous group [ 16 ]. Therefore, this study made sure to breakdown these groups during the discussion of the results. The six groups are:

Asian: Asian/Asian British: Indian, Pakistani, Bangladeshi, and any other Asian background;

Black: Black/African/Caribbean/Black British: African, Caribbean, and any other Black/African/Caribbean background;

Other ethnic backgrounds, including Arab.

White: all white ethnic groups.

Benchmarking data

Published data from the Higher Education Statistics Agency [ 26 ] (a UK organisation responsible for collecting, analysing, and disseminating data related to higher education institutions and students), UKRI funding data [ 19 , 35 ] and 2011 census data [ 36 ] were used to benchmark the EDI data collected within this study. The responses to the data collected were compared to the engineering and technology cluster of academic disciplines, as this is most represented by Connected Everything’s main funded EPSRC. The Higher Education Statistics Agency defines the engineering and technology cluster as including the following subject areas: general engineering; chemical engineering; mineral, metallurgy & materials engineering; civil engineering; electrical, electronic & computer engineering; mechanical, aero & production engineering and; IT, systems sciences & computer software engineering [ 37 ].

When assessing the equality in funding award rates, previous studies have focused on analysing the success rates of only the principal investigators [ 15 , 16 , 38 ]; however, Connected Everything recognised that writing research proposals is a collaborative task, so requested diversity data from the whole research team. The average of the last six years of published principal investigator and co-investigator diversity data for UKRI and EPSRC funding awards (2015–2021) was used to benchmark the Connected Everything funding data [ 35 ]. The UKRI and EPSRC funding review process includes a peer review stage followed by panel pitch and assessment stage; however, the applicant's track record is assessed during the peer review stage, unlike the Connected Everything review process.

The data collected have been used to evaluate the success of the planned migrations to address EDI factors affecting the higher education research ecosystem, as outlined in Table  1 (" Connected Everything’s Equality Diversity and Inclusion Strategy " Section).

Dominance of small number of research-intensive universities receiving funding from network

The dominance of a small number of research-intensive universities receiving funding from a network can have implications for the field of research, including: the unequal distribution of resources; a lack of diversity of research, limited collaboration opportunities; and impact on innovation and progress. Analysis of published EPSRC funding data between 2015 and 2021 [ 19 ], shows that the funding has been predominately (74.1%, 95% CI [71.%, 76.9%] out of £3.98 billion) awarded to Russell Group universities. The Russell Group is a self-selected association of 24 research-intensive universities (out of the 174 universities) in the UK, established in 1994. Evaluation of the universities that received Connected Everything feasibility study funding between 2016–2019, shows that Connected Everything awarded just over half (54.6%, 95% CI [25.1%, 84.0%] out of 11 awards) to Russell Group universities. Figure  2 shows that the Connected Everything funding awarded to Russell Group universities reduced to 44.4%, 95% CI [12.0%, 76.9%] of 9 awards between 2019–2022.

figure 2

A comparison of funding awarded by EPSRC (total = £3.98 billion) across Russell Group universities and non-Russell Group universities, alongside the allocations for Connected Everything I (total = £660 k) and Connected Everything II (total = £540 k)

Dominance of successful applications from men

The percentage point difference between the award rates of researchers who identified as female, those who declare a disability, or identified as ethnic minority applicants and carers and their respective counterparts have been plotted in Fig.  3 . Bars to the right of the axis mean that the award rate of the female/declared-disability/ethnic-minority/carer applicants is greater than that of male/non- disability/white/not carer applicants.

figure 3

Percentage point (PP) differences in award rate by funding provider for gender, disability status, ethnicity and care responsibilities (data not collected by UKRI and EPSRC [ 35 ]). The total number of applicants for each funder are as follows: Connected Everything = 146, EPSRC = 37,960, and UKRI = 140,135. *The numbers of applicants were too small (< 5) to enable a meaningful discussion

Figure  3 (A) shows that between 2015 and 2021 research team applicants who identified as male had a higher award rate than those who identified as female when applying for EPSRC and wider UKRI research council funding. Connected Everything funding applicants who identified as female achieved a higher award rate (19.4%, 95% CI [6.5%, 32.4%] out of 146) compared to male applicants (15.6%, 95% CI [8.8%, 22.4%] out of 146). These data suggest that biases have been reduced by the Connected Everything review process and other mitigation strategies (e.g., visible gender diversity in panel pitch members and publishing CE principal and goals to demonstrate commitment to equality and fairness). This finding aligns with an earlier study that found gender bias during the peer review process, resulting in female investigators receiving less favourable evaluations than their male counterparts [ 15 ].

Over-representation of people identifying as male in engineering and technology academic community

Figure  4 shows the response to the gender question, with 24.2%, 95% CI [19.0%, 29.4%] of 260 responses identifying as female. This aligns with the average for the engineering and technology cluster (21.4%, 95% CI [20.9%, 21.9%] female of 27,740 academic staff), which includes subject areas representative of our main funder, EPSRC [ 22 ]. We also sought to understand the representation of transgender researchers within the network. However, following the rounding policy outlined by UK Government statistics policies and procedures [ 39 ], the number of responses that identified as a different sex to birth was too low (< 5) to enable a meaningful discussion.

figure 4

Gender question responses from a total of 260 respondents

Dominance of successful applications from white academics

Figure  3 (C) shows that researchers with a minority ethnicity consistently have a lower award rate than white researchers when applying for EPSRC and UKRI funding. Similarly, the results in Fig.  3 (C) indicate that white researchers are more successful (8.0% percentage point, 95% CI [-8.6%, 24.6%]) when applying for Connected Everything funding. These results indicate that more measures should be implemented to support the ethnic minority researchers applying for Connected Everything funding, as well as sense checking there is no unconscious bias in any of the Connected Everything funding processes. The breakdown of the ethnicity diversity of applicants at different stages of the Connected Everything review process (i.e. all applications, applicants invited to panel pitch and awarded feasibility studies) has been plotted in Fig.  5 to help identify where more support is needed. Figure  5 shows an increase in the proportion of white researchers from 54%, 95% CI [45.4%, 61.8%] of all 146 applicants to 66%, 95% CI [52.8%, 79.1%] of the 50 researchers invited to the panel pitch. This suggests that stage 1 of the Connected Everything review process (anonymous review of written applications) may favour white applicants and/or introduce unconscious bias into the process.

figure 5

Ethnicity questions responses from different stages during the Connected Everything anonymous review process. The total number of applicants is 146, with 50 at the panel stage and 23 ultimately awarded

Under-representation of those from black or minority ethnic backgrounds

Connected Everything appears to have a wide range of ethnic diversity, as shown in Fig.  6 . The ethnicities Asian (18.3%, 95% CI [13.6%, 23.0%]), Black (5.1%, 95% CI [2.4%, 7.7%]), Chinese (12.5%, 95% CI [8.4%, 16.5%]), mixed (3.5%, 95% CI [1.3%, 5.7%]) and other (7.8%, 95% CI [4.5%, 11.1%]) have a higher representation among the 260 individuals engaging with network’s activities, in contrast to both the engineering and technology academic community and the wider UK population. When separating these groups into the original ethnic diversity answers, it becomes apparent that there is no engagement with ‘Black or Black British: Caribbean’, ‘Mixed: White and Black Caribbean’ or ‘Mixed: White and Asian’ researchers within Connected Everything activities. The lack of engagement with researchers from a Caribbean heritage is systemic of a lack of representation within the UK research landscape [ 25 ].

figure 6

Ethnicity question responses from a total of 260 respondents compared to distribution of the 13,085 UK engineering and technology (E&T) academic staff [ 22 ] and 56 million people recorded in the UK 2011 census data [ 36 ]

Under-representation of disabilities, chronic conditions, invisible illnesses and neurodiversity in funded activities and events.

Figure  7 (A) shows that 5.7%, 95% CI [2.4%, 8.9%] of 194 responses declared a disability. This is higher than the average of engineering and technology academics that identify as disabled (3.4%, 95% CI [3.2%, 3.7%] of 27,730 academics). Between Jan-March 2022, 9.0 million people of working age (16–64) within the UK were identified as disabled by the Office for National Statistics [ 40 ], which is 21% of the working age population [ 27 ]. Considering these statistics, there is a stark under-representation of disabilities, chronic conditions, invisible illnesses and neurodiversity amongst engineering and technology academic staff and those engaging in Connected Everything activities.

figure 7

Responses to A  Disability and B  Care responsibilities questions colected from a total of 194 respondents

Between 2015 and 2020 academics that declared a disability have been less successful than academics without a disability in attracting UKRI and EPSRC funding, as shown in Fig.  3 (B). While Fig.  3 (B) shows that those who declare a disability have a higher Connected Everything funding award rate, the number of applicants who declared a disability was too small (< 5) to enable a meaningful discussion regarding this result.

Under-representation of those with care responsibilities in funded activities and events

In response to the care responsibilities question, Fig.  7 (B) shows that 27.3%, 95% CI [21.1%, 33.6%] of 194 respondents identified as carers, which is higher than the 6% of adults estimated to be providing informal care across the UK in a UK Government survey of the 2020/2021 financial year [ 41 ]. However, the ‘informal care’ definition used by the 2021 survey includes unpaid care to a friend or family member needing support, perhaps due to illness, older age, disability, a mental health condition or addiction [ 41 ]. The Connected Everything survey included care responsibilities across the spectrum of care that includes partners, children, other relatives, pets, friends and kin. It is important to consider a wide spectrum of care responsibilities, as key academic events, such as conferences, have previously been demonstrably exclusionary sites for academics with care responsibilities [ 42 ]. Breakdown analysis of the responses to care responsibilities by gender in Fig.  8 reveals that 37.8%, 95% CI [25.3%, 50.3%] of 58 women respondents reported care responsibilities, compared to 22.6%, 95% CI [61.1%, 76.7%] of 136 men respondents. Our findings reinforce similar studies that conclude the burden of care falls disproportionately on female academics [ 43 ].

figure 8

Responses to care responsibilities when grouped by A  136 males and B  58 females

Figure  3 (D) shows that researchers with careering responsibilities applying for Connected Everything funding have a higher award rate than those researchers applying without care responsibilities. These results suggest that the Connected Everything review process is supportive of researchers with care responsibilities, who have faced barriers in other areas of academia.

Reduced opportunities for ECRs

Early-career researchers (ECRs) represent the transition stage between starting a PhD and senior academic positions. EPSRC defines an ECR as someone who is either within eight years of their PhD award, or equivalent professional training or within six years of their first academic appointment [ 44 ]. These periods exclude any career break, for example, due to family care; health reasons; and reasons related to COVID-19 such as home schooling or increased teaching load. The median age for starting a PhD in the UK is 24 to 25, while PhDs usually last between three and four years [ 45 ]. Therefore, these data would imply that the EPSRC median age of ECRs is between 27 and 37 years. It should be noted, however, that this definition is not ideal and excludes ECRs who may have started their research career later in life.

Connected Everything aims to support ECRs via measures that include mentoring support, workshops, summer schools and podcasts. Figure  9 shows a greater representation of researchers engaging with Connected Everything activities that are aged between 30–44 (62.4%, 95% CI [55.6%, 69.2%] of 194 respondents) when compared to the wider engineering and technology academic community (43.7%, 95% CI [43.1%, 44.3%] of 27,780 academics) and UK population (26.9%, 95% CI [26.9%, 26.9%]).

figure 9

Age question responses from a total of 194 respondents compared to distribution of the 27,780 UK engineering and technology (E&T) academic staff [ 22 ] and 56 million people recorded in the UK 2011 census data [ 36 ]

High competition for funding has a greater impact on ECRs

Figure  10 shows that the largest age bracket applying for and winning Connected Everything funding is 31–45, whereas 72%, CI 95% [70.1%, 74.5%] of 12,075 researchers awarded EPSRC grants between 2015 and 2021 were 40 years or older. These results suggest that measures introduced by Connected Everything has been successful at providing funding opportunities for researchers who are likely to be early-mid career stage.

figure 10

Age of researchers at applicant and awarded funding stages for A  Connected Everything between 2019–2022 (total of 146 applicants and 23 awarded) and B  EPSRC funding between 2015–2021 [ 35 ] (total of 35,780 applicants and 12,075 awarded)

The results of this paper provide insights into the impact that Connected Everything’s planned mitigations have had on promoting equality, diversity, and inclusion (EDI) in research and funding. Collecting EDI data from individuals who engage with network activities and apply for research funding enabled an evaluation of whether these mitigations have been successful in achieving the intended outcomes outlined at the start of the study, as summarised in Table  2 .

The results in Table  2 indicate that Connected Everything’s approach to EDI has helped achieve the intended outcome to improve representation of women, ECRs, those with a declared disability and black/minority ethnic backgrounds engaging with network events when compared to the engineering and technology academic community. In addition, the network has helped raise awareness of the high presence of researchers with care responsibilities at network events, which can help to track progress towards making future events inclusive and accessible towards these carers. The data highlights two areas for improvement: (1) ensuring a gender balance; and (2) increasing representation of those with declared disabilities. Both these discrepancies are indicative of the wider imbalances and underrepresentation of these groups in the engineering and technology academic community [ 26 ], yet represent areas where networks can strive to make a difference. Possible strategies include: using targeted outreach; promoting greater representation of these groups in event speakers; and going further to create a welcoming and inclusive environment. One barrier that can disproportionately affect women researchers is the need to balance care responsibilities with attending network events [ 46 ]. This was reflected in the Connected Everything data that reported 37.8%, 95% CI [25.3%, 50.3%] of women engaging with network activities had care responsibilities, compared to 22.6%, 95% CI [61.1%, 76.7%] of men. Providing accommodations such as on-site childcare, flexible scheduling, or virtual attendance options can therefore help to promote inclusivity and allow more women researchers to attend.

Only 5.7%, 95% CI [2.4%, 8.9%] of responses engaging with Connected Everything declared a disability, which is higher than the engineering and technology academic community (3.4%, 95% CI [3.2%, 3.7%]) [ 26 ], but unrepresentative of the wider UK population. It has been suggested that academics can be uncomfortable when declaring disabilities because scholarly contributions and institutional citizenship are so prized that they feel they cannot be honest about their issues or health concerns and keep them secret [ 47 ]. In research networks, it is important to be mindful of this hidden group within higher education and ensure that measures are put in place to make the network’s activities inclusive to all. Future considerations for accommodations to improve research events inclusivity include: improving physical accessibility of events; providing assistive technology such as screen readers, audio descriptions, and captioning can help individuals with visual or hearing impairments to access and participate; providing sign language interpreters; offering flexible scheduling options; and the provision of quiet rooms, written materials in accessible formats, and support staff trained to work with individuals with cognitive disabilities.

Connected Everything introduced measures (e.g., anonymised reviewing process, Q&A sessions before funding calls, inclusive design of panel pitch) to help address inequalities in how funding is awarded. Table 2 shows success in reducing the dominance of researchers who identify as male and research-intensive universities in winning research funding and that researchers with care responsibilities were more successful at winning funding than those without care responsibilities. The data revealed that the proposed measures were unable to address the inequality in award rates between white and ethnic minority researchers, which is an area to look to improve. The inequality appears to occur during the anonymous review stage, with a greater proportion of white researchers being invited to panel. Recommendations to make the review process fairer include: ensuring greater diversity of reviewers; reviewer anti-bias training; and automatic adjustments to correct for known biases in writing style [ 16 , 32 ].

When reflecting on the development of a strategy to embed EDI throughout the network, Connected Everything has learned several key lessons that may benefit other networks undergoing a similar activity. These include:

EDI is never ‘done’: There is a constant need to review approaches to EDI to ensure they remain relevant to the network community. Connected Everything could review its principles to include the concept of justice in its approach to diversity and inclusion. The concept of justice concerning EDI refers to the removal of systematic barriers that stop fair and equitable distribution of resources and opportunities among all members of society, regardless of their individual characteristics or backgrounds. The principles and subsequent actions could be reviewed against the EDI expectations [ 14 ], paying particular attention to areas where barriers may still be present. For example, shifting from welcoming people into existing structures and culture to creating new structures and culture together, with specific emphasis on decision or advisory mechanisms within the network. This activity could lend itself to focusing more on tailored support to overcome barriers, thus achieving equity, if it is not within the control of the network to remove the barrier itself (justice).

Widen diversity categories: By collecting data on a broad range of characteristics, we can identify and address disparities and biases that might otherwise be overlooked. A weakness of this dataset is that ignores the experience of those with intersectional identities, across race, ethnicity, gender, class, disability and/ or LGBTQI. The Wellcome Trust noted how little was known about the socio-economic background of scientists and researchers [ 48 ].

Collect data on whole research teams: For the first two calls for feasibility study funding, Connected Everything only asked the Principal Investigator to voluntarily provide their data. We realised that this was a limited approach and, in the third call, asked for the data regarding the whole research team to be shared anonymously. Furthermore, we do not currently measure the diversity of our event speakers, panellists or reviewers. Collecting these data in the future will help to ensure the network is accountable and will ensure that all groups are represented during our activities and in the funding decision-making process.

High response rate: Previous surveys measuring network diversity (e.g., [ 7 ]) have struggled to get responses when surveying their memberships; whereas, this study achieved a response rate of 73.8%. We attribute this high response rate to sending EDI data requests on the point of contact with the network (e.g., on submitting funding proposals or after attending network events), rather than trying to survey the entire network membership at anyone point in time.

Improve administration: The administration associated with collecting EDI data requires a commitment to transparency, inclusivity, and continuous improvement. For example, during the first feasibility funding call, Connected Everything made it clear that the review process would be anonymous, but the application form was not in separate documents. This made anonymising the application forms extremely time-consuming. For the subsequent calls, separate documents were created – Part A for identifying information (Principal Investigator contact details, Project Team and Industry collaborators) and Part B for the research idea.

Accepting that this can be uncomfortable: Trying to improve EDI can be uncomfortable because it often requires challenging our assumptions, biases, and existing systems and structures. However, it is essential if we want to make real progress towards equity and inclusivity. Creating processes to support embedding EDI takes time and Connected Everything has found it is rare to get it right the first time. Connected Everything is sharing its learning as widely as possible both to support others in their approaches and continue our learning as we reflect on how to continually improve, even when it is challenging.

Enabling individual engagement with EDI: During this work, Connected Everything recognised that methods for engaging with such EDI issues in research design and delivery are lacking. Connected Everything, with support from the Future Food Beacon of Excellence at the University of Nottingham, set out to develop a card-based tool [ 49 ] to help researchers and stakeholders identify questions around how their work may promote equity and increase inclusion or have a negative impact towards one or more protected groups and how this can be overcome. The results of this have been shared at conference presentations [ 50 ] and will be published later.

While this study provides insights into how EDI can be improved in research network activities and funding processes, it is essential to acknowledge several limitations that may impact the interpretation of the findings.

Sample size and generalisability: A total of 260 responses were received, which may not be representative of our overall network of 500 + members. Nevertheless, this data provides a sense of the current diversity engaging in Connected Everything activities and funding opportunities, which we can compare with other available data to steer action to further diversify the network.

Handling of missing data: Out of the 260 responses, 66 data points were missing for questions regarding age, disability, and caring responsibilities. These questions were mistakenly omitted from a Connected Everything summer school survey, contributing to 62 missing data points. While we assumed the remainer of missing data to be at random during analysis, it's important to acknowledge it could be related to other factors, potentially introducing bias into our results.

Emphasis on quantitative data: The study relies on using quantitative data to evaluate the impact of the EDI measures introduced by Connected Everything. However, relying solely on quantitative metrics may overlook nuanced aspects of EDI that cannot be easily quantified. For example, EDI encompasses multifaceted issues influenced by historical, cultural, and contextual factors. These nuances may not be fully captured by numbers alone. In addition, some EDI efforts may not yield immediate measurable outcomes but still contribute to a more inclusive environment.

Diversity and inclusion are not synonymous: The study proposes 21 measures to contribute towards creating an equal, diverse and inclusive research culture and collects diversity data to measure the impact of these measures. However, while diversity is simpler to monitor, increasing diversity alone does not guarantee equality or inclusion. Even with diverse research groups, individuals from underrepresented groups may still face barriers, microaggressions, or exclusion.

Balancing anonymity and rigour in grant reviews:The proposed anonymous review process proposed by Connected Everything removes personal and organisational details from the research ideas under reviewer evaluation. However, there exists a possibility that a reviewer could discern the identity of the grant applicant based on the research idea. Reviewers are expected to be subject matter experts in the field relevant to the grant proposal they are evaluating. Given the specialised nature of scientific research, it is conceivable that a well-known applicant could be identified through the specifics of the work, the methodologies employed, and even the writing style.

Expanding gender identity options: A limitation of this study emerged from the restricted gender options (male, female, other, prefer not to say) provided to respondents when answering the gender identity question. This limitation reflects the context of data collection in 2018, a time when diversity monitoring guidance was still limited. As our understanding of gender identity evolves beyond binary definitions, future data collection efforts should embrace a more expansive and inclusive approach, recognising the diverse spectrum of gender identities.

In conclusion, this study provides evidence of the effectiveness of a research network's approach to promoting equality, diversity, and inclusion (EDI) in research and funding. By collecting EDI data from individuals who engage with network activities and apply for research funding, this study has shown that the network's initiatives have had a positive impact on representation and fairness in the funding process. Specifically, the analysis reveals that the network is successful at engaging with ECRs, and those with care responsibilities and has a diverse range of ethnicities represented at Connected Everything events. Additionally, the network activities have a more equal gender balance and greater representation of researchers with disabilities when compared to the engineering and technology academic community, though there is still an underrepresentation of these groups compared to the national population.

Connected Everything introduced measures to help address inequalities in how funding is awarded. The measures introduced helped reduce the dominance of researchers who identified as male and research-intensive universities in winning research funding. Additionally, researchers with care responsibilities were more successful at winning funding than those without care responsibilities. However, inequality persisted with white researchers achieving higher award rates than those from ethnic minority backgrounds. Recommendations to make the review process fairer include: ensuring greater diversity of reviewers; reviewer anti-bias training; and automatic adjustments to correct for known biases in writing style.

Connected Everything’s approach to embedding EDI in network activities has already been shared widely with other EPSRC-funded networks and Hubs (e.g. the UKRI Circular Economy Hub and the UK Acoustics Network Plus). The network hopes that these findings will inform broader efforts to promote EDI in research and funding and that researchers, funders, and other stakeholders will be encouraged to adopt evidence-based strategies for advancing this important goal.

Availability of data and materials

The data collected was anonymously, however, it may be possible to identify an individual by combining specific records of the data request form data. Therefore, the study data has been presented in aggregate form to protect the confidential of individuals and the data utilised in this study cannot be made openly accessible due to ethical obligations to protect the privacy and confidentiality of the data providers.

Abbreviations

Early career researcher

Equality, diversity and inclusion

Engineering physical sciences research council

UK research and innovation

Xuan J, Ocone R. The equality, diversity and inclusion in energy and AI: call for actions. Energy AI. 2022;8:100152.

Article   Google Scholar  

Guyan K, Oloyede FD. Equality, diversity and inclusion in research and innovation: UK review. Advance HE; 2019.  https://www.ukri.org/wp-content/uploads/2020/10/UKRI-020920-EDI-EvidenceReviewUK.pdf .

Cooke A, Kemeny T. Cities, immigrant diversity, and complex problem solving. Res Policy. 2017;46:1175–85.

AlShebli BK, Rahwan T, Woon WL. The preeminence of ethnic diversity in scientific collaboration. Nat Commun. 2018;9:5163.

Gagnon S, Augustin T, Cukier W. Interplay for change in equality, diversity and inclusion studies: Hum Relations. Epub ahead of print 23 April 2021. https://doi.org/10.1177/00187267211002239 .

Everything C. https://connectedeverything.ac.uk/ . Accessed 27 Feb (2023).

Chandler-Wilde S, Kanza S, Fisher O, Fearnshaw D, Jones E. Reflections on an EDI Survey of UK-Government-Funded Research Networks in the UK. In: The 51st International Congress and Exposition on Noise Control Engineering. St. Albans: Institute of Acoustics; 2022. p. 9.0–940.

Google Scholar  

Prathivadi Bhayankaram K, Prathivadi Bhayankaram N. Conference panels: do they reflect the diversity of the NHS workforce? BMJ Lead 2022;6:57 LP – 59.

Goodman SW, Pepinsky TB. Gender representation and strategies for panel diversity: Lessons from the APSA Annual Meeting. PS Polit Sci Polit 2019;52:669–676.

Olsen J, Griffiths M, Soorenian A, et al. Reporting from the margins: disabled academics reflections on higher education. Scand J Disabil Res. 2020;22:265–74.

Baldie D, Dickson CAW, Sixsmith J. Building an Inclusive Research Culture. In: Knowledge, Innovation, and Impact. 2021, pp. 149–157.

Sato S, Gygax PM, Randall J, et al. The leaky pipeline in research grant peer review and funding decisions: challenges and future directions. High Educ 2020 821. 2020;82:145–62.

Recio-Saucedo A, Crane K, Meadmore K, et al. What works for peer review and decision-making in research funding: a realist synthesis. Res Integr Peer Rev. 2022;2022 71:7: 1–28.

EPSRC. Expectations for equality, diversity and inclusion – UKRI, https://www.ukri.org/about-us/epsrc/our-policies-and-standards/equality-diversity-and-inclusion/expectations-for-equality-diversity-and-inclusion/ (2022, Accessed 26 Apr 2022).

Witteman HO, Hendricks M, Straus S, et al. Are gender gaps due to evaluations of the applicant or the science? A natural experiment at a national funding agency. Lancet. 2019;393:531–40.

Li YL, Bretscher H, Oliver R, et al. Racism, equity and inclusion in research funding. Sci Parliam. 2020;76:17–9.

UKRI publishes latest diversity. data for research funding – UKRI, https://www.ukri.org/news/ukri-publishes-latest-diversity-data-for-research-funding/ (Accessed 28 July 2022).

Higher Education Statistics Agency. What do HE students study? https://www.hesa.ac.uk/data-and-analysis/students/what-study (2023, Accessed 25 March 2023).

UKRI. Competitive funding decisions, https://www.ukri.org/what-we-offer/what-we-have-funded/competitive-funding-decisions / (2023, Accessed 2 April 2023).

Santos G, Van Phu SD. Gender and academic rank in the UK. Sustain. 2019;11:3171.

Jebsen JM, Nicoll Baines K, Oliver RA, et al. Dismantling barriers faced by women in STEM. Nat Chem. 2022;14:1203–6.

Advance HE. Equality in higher education: staff statistical report 2021 | Advance HE, https://www.advance-he.ac.uk/knowledge-hub/equality-higher-education-statistical-report-2021 (28 October 2021, Accessed 26 April 2022).

EngineeringUK. Engineering in Higher Education, https://www.engineeringuk.com/media/318874/engineering-in-higher-education_report_engineeringuk_march23_fv.pdf (2023, Accessed 25 March 2023).

Bhopal K. Academics of colour in elite universities in the UK and the USA: the ‘unspoken system of exclusion’. Stud High Educ. 2022;47:2127–37.

Williams P, Bath S, Arday J et al. The Broken Pieline: Barriers to Black PhD Students Accessing Research Council Funding . 2019.

HESA. Who’s working in HE? Personal characteristics, https://www.hesa.ac.uk/data-and-analysis/staff/working-in-he/characteristics (2023, Accessed 1 April 2023).

Office for National Statistics. Principal projection - UK population in age groups, https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationprojections/datasets/tablea21principalprojectionukpopulationinagegroups (2022, Accessed 3 August 2022).

HESA. Who’s studying in HE? Personal characteristics, https://www.hesa.ac.uk/data-and-analysis/students/whos-in-he/characteristics (2023, Accessed 1 April 2023).

Herman E, Nicholas D, Watkinson A et al. The impact of the pandemic on early career researchers: what we already know from the internationally published literature. Prof la Inf ; 30. Epub ahead of print 11 March 2021. https://doi.org/10.3145/epi.2021.mar.08 .

Moreau M-P, Robertson M. ‘Care-free at the top’? Exploring the experiences of senior academic staff who are caregivers, https://srhe.ac.uk/wp-content/uploads/2020/03/Moreau-Robertson-SRHE-Research-Report.pdf (2019).

Shillington AM, Gehlert S, Nurius PS, et al. COVID-19 and long-term impacts on tenure-line careers. J Soc Social Work Res. 2020;11:499–507.

de Winde CM, Sarabipour S, Carignano H et al. Towards inclusive funding practices for early career researchers. J Sci Policy Gov; 18. Epub ahead of print 24 March 2021. https://doi.org/10.38126/JSPG180105 .

Trust W. Grant funding data report 2018/19, https://wellcome.org/sites/default/files/grant-funding-data-2018-2019.pdf (2020).

Vallée-Tourangeau G, Wheelock A, Vandrevala T, et al. Peer reviewers’ dilemmas: a qualitative exploration of decisional conflict in the evaluation of grant applications in the medical humanities and social sciences. Humanit Soc Sci Commun. 2022;2022 91:9: 1–11.

Diversity data – UKRI. https://www.ukri.org/what-we-offer/supporting-healthy-research-and-innovation-culture/equality-diversity-and-inclusion/diversity-data/ (accessed 30 September 2022).

2011 Census - Office for National Statistics. https://www.ons.gov.uk/census/2011census (Accessed 2 August 2022).

Cost centres. (2012/13 onwards) | HESA, https://www.hesa.ac.uk/support/documentation/cost-centres/2012-13-onwards (Accessed 28 July 2022).

Viner N, Powell P, Green R. Institutionalized biases in the award of research grants: a preliminary analysis revisiting the principle of accumulative advantage. Res Policy. 2004;33:443–54.

ofqual. Rounding policy - GOV.UK, https://www.gov.uk/government/publications/ofquals-statistics-policies-and-procedures/rounding-policy (2023, Accessed 2 April 2023).

Office for National Statistics. Labour market status of disabled people, https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/labourmarketstatusofdisabledpeoplea08 (2022, Accessed 3 August 2022).

Family Resources Survey. financial year 2020 to 2021 - GOV.UK, https://www.gov.uk/government/statistics/family-resources-survey-financial-year-2020-to-2021 (Accessed 10 Aug 2022).

Henderson E. Academics in two places at once: (not) managing caring responsibilities at conferences. 2018, p. 218.

Jolly S, Griffith KA, DeCastro R, et al. Gender differences in time spent on parenting and domestic responsibilities by high-achieving young physician-researchers. Ann Intern Med. 2014;160:344–53.

UKRI. Early career researchers, https://www.ukri.org/what-we-offer/developing-people-and-skills/esrc/early-career-researchers/ (2022, Accessed 2 April 2023).

Cornell B. PhD Life: The UK student experience , www.hepi.ac.uk (2019, Accessed 2 April 2023).

Kibbe MR, Kapadia MR. Underrepresentation of women at academic medical conferences—manels must stop. JAMA Netw Open 2020; 3:e2018676–e2018676.

Brown N, Leigh J. Ableism in academia: where are the disabled and ill academics? 2018; 33: 985–989.  https://doi.org/10.1080/0968759920181455627

Bridge Group. Diversity in Grant Awarding and Recruitment at Wellcome Summary Report. 2017.

Peter Craigon O, Fisher D, Fearnshaw et al. VERSION 1 - The Equality Diversity and Inclusion cards. Epub ahead of print 2022. https://doi.org/10.6084/m9.figshare.21222212.v3 .

Connected Everything II. EDI ideation cards for research - YouTube, https://www.youtube.com/watch?v=GdJjL6AaBbc&ab_channel=ConnectedEverythingII (2022, Accessed 7 June 2023).

Download references

Acknowledgements

The authors would like to acknowledge the support Engineering and Physical Sciences Research Council (EPSRC) [grant number EP/S036113/1], Connected Everything II: Accelerating Digital Manufacturing Research Collaboration and Innovation. The authors would also like to gratefully acknowledge the Connected Everything Executive Group for their contribution towards developing Connected Everything’s equality, diversity and inclusion strategy.

This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) [grant number EP/S036113/1].

Author information

Authors and affiliations.

Food, Water, Waste Research Group, Faculty of Engineering, University of Nottingham, University Park, Nottingham, UK

Oliver J. Fisher

Human Factors Research Group, Faculty of Engineering, University of Nottingham, University Park, Nottingham, UK

Debra Fearnshaw & Sarah Sharples

School of Food Science and Nutrition, University of Leeds, Leeds, UK

Nicholas J. Watson

School of Engineering, University of Liverpool, Liverpool, UK

Peter Green

Centre for Circular Economy, University of Exeter, Exeter, UK

Fiona Charnley

Institute for Manufacturing, University of Cambridge, Cambridge, UK

Duncan McFarlane

You can also search for this author in PubMed   Google Scholar

Contributions

OJF analysed and interpreted the data, and was the lead author in writing and revising the manuscript. DF led the data acquisition and supported the interpretation of the data. DF was also a major contributor to the design of the equality diversity and inclusion (EDI) strategy proposed in this work. NJW supported the design of the EDI strategy and was a major contributor in reviewing and revising the manuscript. PG supported the design of the EDI strategy, and was a major contributor in reviewing and revising the manuscript. FC supported the design of the EDI strategy and the interpretation of the data. DM supported the design of the EDI strategy. SS led the development EDI strategy proposed in this work, and was a major contributor in data interpretation and reviewing and revising the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Debra Fearnshaw .

Ethics declarations

Ethics approval and consent to participate.

Research was considered exempt from requiring ethical approval as is uses completely anonymous surveys results that are routinely collected as part of the administration of the network plus and informed consent was obtained at the time of original data collection.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Fisher, O.J., Fearnshaw, D., Watson, N.J. et al. Promoting equality, diversity and inclusion in research and funding: reflections from a digital manufacturing research network. Res Integr Peer Rev 9 , 5 (2024). https://doi.org/10.1186/s41073-024-00144-w

Download citation

Received : 12 October 2023

Accepted : 09 April 2024

Published : 16 May 2024

DOI : https://doi.org/10.1186/s41073-024-00144-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Research integrity
  • Network policy
  • Funding reviewing
  • EDI interventions

Research Integrity and Peer Review

ISSN: 2058-8615

peer reviewed journal articles on research methods

  • Original article
  • Open access
  • Published: 20 May 2024

The great detectives: humans versus AI detectors in catching large language model-generated medical writing

  • Jae Q. J. Liu 1 ,
  • Kelvin T. K. Hui 1 ,
  • Fadi Al Zoubi 1 ,
  • Zing Z. X. Zhou 1 ,
  • Dino Samartzis 2 ,
  • Curtis C. H. Yu 1 ,
  • Jeremy R. Chang 1 &
  • Arnold Y. L. Wong   ORCID: orcid.org/0000-0002-5911-5756 1  

International Journal for Educational Integrity volume  20 , Article number:  8 ( 2024 ) Cite this article

4 Altmetric

Metrics details

The application of artificial intelligence (AI) in academic writing has raised concerns regarding accuracy, ethics, and scientific rigour. Some AI content detectors may not accurately identify AI-generated texts, especially those that have undergone paraphrasing. Therefore, there is a pressing need for efficacious approaches or guidelines to govern AI usage in specific disciplines.

Our study aims to compare the accuracy of mainstream AI content detectors and human reviewers in detecting AI-generated rehabilitation-related articles with or without paraphrasing.

Study design

This cross-sectional study purposively chose 50 rehabilitation-related articles from four peer-reviewed journals, and then fabricated another 50 articles using ChatGPT. Specifically, ChatGPT was used to generate the introduction, discussion, and conclusion sections based on the original titles, methods, and results. Wordtune was then used to rephrase the ChatGPT-generated articles. Six common AI content detectors (Originality.ai, Turnitin, ZeroGPT, GPTZero, Content at Scale, and GPT-2 Output Detector) were employed to identify AI content for the original, ChatGPT-generated and AI-rephrased articles. Four human reviewers (two student reviewers and two professorial reviewers) were recruited to differentiate between the original articles and AI-rephrased articles, which were expected to be more difficult to detect. They were instructed to give reasons for their judgements.

Originality.ai correctly detected 100% of ChatGPT-generated and AI-rephrased texts. ZeroGPT accurately detected 96% of ChatGPT-generated and 88% of AI-rephrased articles. The areas under the receiver operating characteristic curve (AUROC) of ZeroGPT were 0.98 for identifying human-written and AI articles. Turnitin showed a 0% misclassification rate for human-written articles, although it only identified 30% of AI-rephrased articles. Professorial reviewers accurately discriminated at least 96% of AI-rephrased articles, but they misclassified 12% of human-written articles as AI-generated. On average, students only identified 76% of AI-rephrased articles. Reviewers identified AI-rephrased articles based on ‘incoherent content’ (34.36%), followed by ‘grammatical errors’ (20.26%), and ‘insufficient evidence’ (16.15%).

Conclusions and relevance

This study directly compared the accuracy of advanced AI detectors and human reviewers in detecting AI-generated medical writing after paraphrasing. Our findings demonstrate that specific detectors and experienced reviewers can accurately identify articles generated by Large Language Models, even after paraphrasing. The rationale employed by our reviewers in their assessments can inform future evaluation strategies for monitoring AI usage in medical education or publications. AI content detectors may be incorporated as an additional screening tool in the peer-review process of academic journals.

Introduction

Chat Generative Pre-trained Transformer (ChatGPT; OpenAI, USA) is a popular and responsive chatbot that has surpassed other Large Language Models (LLMs) in terms of usage (ChatGPT Statistics 2023 ). Being trained with 175 billion parameters, ChatGPT has demonstrated its capabilities in the field of medicine and digital health (OpenAI 2023 ). It has been reported to be able to solve higher-order reasoning questions in pathology (Sinha 2023 ). Currently, ChatGPT has been used in generating discharge summaries (Patel &Lam 2023 ), aiding in diagnosis (Mehnen et al. 2023 ), and providing health information to patients with cancer (Hopkins et al. 2023 ). Currently, ChatGPT has become a valuable writing assistant, especially in medical writing (Imran & Almusharaf 2023 ).

However, scientists did not support granting ChatGPT authorship in academic publishing because it could not be held accountable for the ethics of the content (Stokel-Walker 2023 ). Its tendency to generate plausible but non-rigorous or misleading content has raised doubts about the reliability of its outputs (Sallam 2023 ; Manohar & Prasad 2023 ). This poses a risk of disseminating unsubstantiated information. Therefore, scholars have been exploring ways to detect AI-generated content to uphold academic integrity, although there are conflicting perspectives on the utilization of detectors in academic publishing. Previous research found that 14 existing AI detection tools exhibited an average accuracy of less than 80% (Weber-Wulff et al. 2023 ). However, the availability of paraphrasing tools further complicates the detection of LLM-generated texts. Some AI content detectors were ineffective in identifying paraphrased texts (Anderson et al. 2023 ; Weber-Wulff et al. 2023 ). Moreover, some detectors may misclassify human-written articles, which can undermine the credibility of academic publications (Liang et al. 2023 ; Sadasivan et al. 2023 ).

Nevertheless, there have been advancements in AI content detectors. Turnitin and Originality.ai have shown excellent accuracy in discriminating between AI-generated and human-written essays in various academic disciplines (e.g., social sciences, natural sciences, and humanities) (Walters 2023 ). However, their effectiveness in detecting paraphrased academic articles remains uncertain. Importantly, the accuracy of universal AI detectors has shown inconsistencies across studies in different domains (Gao et al. 2023 ; Anderson et al. 2023 ; Walters 2023 ). Therefore, continuous efforts are necessary to identify detectors that can achieve near-perfect accuracy, especially in the detection of medical texts, which is of particular concern to the academic community.

In addition to using AI detectors to help identify AI-generated articles, it is crucial to assess the ability of human reviewers to detect AI-generated formal academic articles. A study found that four peer reviewers only achieved an average accuracy of 68% in identifying ChatGPT-generated biomedical abstracts (Gao et al. 2023 ). However, this study had limitations because the reviewers only assessed abstracts instead of full-text articles, and their assessments were limited to a binary choice of ‘yes’ or ‘no’ without providing any justifications for their decisions. The reported moderate accuracy is inadequate for informing new editorial policy regarding AI usage. To establish effective regulations for supervising AI usage in journal publishing, it is necessary to continuously explore the accuracy of experienced human reviewers and to understand the patterns and stylistic features of AI-generated content. This can help researchers, educators, and editors develop discipline-specific guidelines to effectively supervise AI usage in academic publishing.

Against this background, the current study aimed to (1) compare the accuracy of several common AI content detectors and human reviewers with different levels of research training in detecting AI-generated academic articles with or without paraphrasing; and (2) understand the rationale by human reviewers for determining AI-generated content.

The current study was approved by the Institutional Review Board of a university. This study consisted of four stages: (1) identifying 50 published peer-reviewed papers from four high-impact journals; (2) generating artificial papers using ChatGPT; (3) rephrasing the ChatGPT-generated papers using a paraphrasing tool called Wordtune; and (4) employing six AI content detectors to distinguish between the original papers, ChatGPT-generated papers, and AI-rephrased papers. To determine human reviewers’ ability to discern between the original papers and AI-rephrased papers, four reviewers reviewed and assessed these two types of papers (Fig. 1 ).

figure 1

An outline of the study

Identifying peer-reviewed papers

As this study was conducted by researchers involved in rehabilitation sciences, only rehabilitation-related publications were considered. A researcher searched on PubMed in June 2023 using a search strategy involving: (“Neurological Rehabilitation”[Mesh]) OR (“Cardiac Rehabilitation”[Mesh]) OR (“Pulmonary Rehabilitation” [Mesh]) OR (“Exercise Therapy”[Mesh]) OR (“Physical Therapy”[Mesh]) OR (“Activities of Daily Living”[Mesh]) OR (“Self Care”[Mesh]) OR (“Self-Management”[Mesh]). English rehabilitation-related articles published between June 2013 and June 2023 in one of four high-impact journals ( Nature, The Lancet, JAMA, and British Medical Journal [BMJ] ) were eligible for inclusion. Fifty articles were included and categorized into four categories (musculoskeletal, cardiopulmonary, neurology, and pediatric) (Appendix 1 ).

Generating academic articles using ChatGPT

ChatGPT (GPT-3.5-Turbo, OpenAI, USA) was used to generate the introduction, discussion, and conclusion sections of fabricated articles in July 2023. Specifically, before starting a conversation with ChatGPT, we gave the instruction “ Considering yourself as an academic writer ” to put it into a specific role. After that, we entered “ Please write a convincing scientific introduction on the topic of [original topic] with some citations in the text” into GPT-3.5-Turbo to generate the ‘Introduction’ section. The ‘Discussion’ section was generated by the request “Please critically discuss the methods and results below: [original method] and [original result], Please include citations in the text” . For the ‘Conclusions’ section, we instructed ChatGPT to create a summary of the generated discussion section with reference to the original title. Collectively, each ChatGPT-generated article comprised fabricated introduction, discussion, and conclusions sections, alongside the original methods and results sections.

Rephrasing ChatGPT-generated articles using a paraphrasing tool

Wordtune (AI21 Labs, Tel Aviv, Israel) (Wordtune 2023 ), a widely used AI-powered writing assistant, was applied to paraphrase 50 ChatGPT-generated articles, specifically targeting the introduction, discussion, and conclusion sections, to enhance their authenticity.

Identification of AI-generated articles

Using ai content detectors.

Six AI content detectors, which have been widely used (Walters 2023 ; Crothers 2023 ; Top 10 AI Detector Tools 2023 ), were applied to identify texts generated by AI language models in August 2023. They classified a given paper as “human-written” or “AI-generated”, with a confidence level reported as an AI score [% ‘confidence in predicting that the content was produced by an AI tool’] or a perplexity score [randomness or particularity of the text]. A lower perplexity score indicates that the text has relatively few random elements and is more likely to be written by generative AI (GPTZero 2023 ). The 50 original articles, 50 ChatGPT-generated articles, and 50 AI-rephrased articles were evaluated for authenticity by two paid (Originality.ai, Originality. AI Inc., Ontario, Canada; and Turnitin’s AI writing detection, Turnitin LLC, CA, USA) and four free AI content detectors (ZeroGPT, Munchberg, Germany; GPTZero, NJ, USA; Content at Scale, AZ, USA; and GPT-2 Output Detector, CA, USA). The authentic methods and results sections were not entered into the AI content detectors. Since the GPT-2 Output Detector has a restriction of 510 tokens per attempt, each article was divided into several parts for input, and the overall AI score of the article was calculated based on the mean score of all parts.

Peer reviews by human reviewers

Four blinded reviewers with backgrounds in physiotherapy and varying levels of research training (two college student reviewers and two professorial reviewers) were recruited to review and discern articles. To minimize the risk of recall bias, a researcher randomly assigned the 50 original articles and 50 AI-rephrased articles (ChatGPT-generated articles after rephrasing) to two electronic folders by a coin toss. If an original article was placed in Folder 1, the corresponding AI-rephrased article was assigned to Folder 2. Reviewers were instructed to review all the papers in Folder 1 first and then wait for at least 7 days before reviewing papers in Folder 2. This approach would reduce the reviewers’ risk of remembering the details of the original papers and AI-rephrased articles on the same topic (Fisher & Radvansky 2018 ).

The four reviewers were instructed to use an online Google form (Appendix 2 ) to make their decision and provide reasons behind their decision. Reviewers were instructed to enter the article number on the Google form before reviewing the article. Once the reviewers had gathered sufficient information/confidence to make the decision, they would give a binary response (“AI-rephrased” or “human-written”). Additionally, they should select their top three reasons for their decision from a list of options (i.e., coherence creativity, evidence-based, grammatical errors, and vocabulary diversity) (Walters 2019; Lee 2022 ). The definitions of these reasons (Appendix 3 ) were explained to the reviewers beforehand. If they could not find the best answers, they could enter additional responses. When the reviewer submitted the form, the total duration was automatically recorded by the system.

Statistical analysis

Descriptive analyses were reported when appropriate. Shapiro-Wilk tests were used to evaluate the normality of the data, while Levene’s tests were employed to assess the homogeneity of variance. Logarithmic transformation was applied to the data related to ‘time taken’ to achieve the normal distribution. Separate two-way repeated measures analysis of variance (ANOVA) with post-hoc comparisons were conducted to evaluate the effect of detectors and AI usage on AI scores, and the effect of reviewers and AI usage on the time taken. Separate paired t-tests with Bonferroni correction were applied for pairwise comparisons. The GPTZero Perplexity scores were compared among groups of articles using one-way repeated ANOVA. Subsequently, separate paired t-tests with Bonferroni correction were used for pairwise comparisons. Receiver operating characteristic (ROC) curves were generated to determine cutoff values for the highest sensitivity and specificity in detecting AI articles by AI content detectors. The area under the ROC curve (AUROC) was also calculated. Inter-rater agreement was calculated using Fleiss’s kappa, and Cohen’s kappa with Bonferroni correction was used for multiple comparisons. The significance level was set at p  < 0.05. All statistical analyses were performed using SPSS (version 26; SPSS Inc., Chicago, IL, USA).

The accuracy of AI detectors in identifying AI articles

The accuracy of AI content detectors in identifying AI-generated articles is shown in Fig. 2 a and b. Notably, Originality.ai demonstrated perfect accuracy (100%) in detecting both ChatGPT-generated and AI-rephrased articles. ZeroGPT showed near-perfect accuracy (96%) in identifying ChatGPT-generated articles. The optimal ZeroGPT cut-off value for distinguishing between original and AI articles (ChatGPT-generated and AI-rephrased) was 42.45% (Fig. 3 a) , with a sensitivity of 98% and a specificity of 92%. The GPT-2 Output Detector achieved an accuracy of 96% in identifying ChatGPT-generated articles based on an AI score cutoff value of 1.46%, as suggested by previous research (Gao et al. 2023 ). Likewise, Turnitin showed near-perfect accuracy (94%) in discerning ChatGPT-generated articles but only correctly discerned 30% of AI-rephrased articles. GPTZero and Content at Scale only correctly identified 70 and 52% of ChatGPT-generated papers, respectively. While Turnitin did not misclassify any original articles, Content at Scale and GPTZero incorrectly classified 28 and 22% of the original articles, respectively. AI scores, or perplexity scores, in response to the original, ChatGPT-generated, and AI-rephrased articles from each AI content detector are shown in Appendix 4 . The classification of responses from each AI content detector is shown in Appendix 5 .

figure 2

The accuracy of artificial intelligence (AI) content detectors and human reviewers in identifying large language model (LLM)-generated texts. a The accuracy of six AI content detectors in identifying AI-generated articles; b the percentage of misclassification of human-written articles as AI-generated ones by detectors; c the accuracy of four human reviewers (reviewers 1 and 2 were college students, while reviewers 3 and 4 were professorial reviewers) in identifying AI-rephrased articles; and d the percentage of misclassifying human-written articles as AI-rephrased ones by reviewers

figure 3

The receiver operating characteristic (ROC) curve and the area under the ROC (AUROC) of artificial intelligence (AI) content detectors. a The ROC curve and AUROC of ZeroGPT for discriminating between original and AI articles, with the AUROC of 0.98; b the ROC curve and AUROC of GPTZero for discriminating between original and AI articles, with the AUROC of 0.312

All AI content detectors, except Originality.ai, gave rephrased articles lower scores as compared to the corresponding ChatGPT-generated articles (Fig. 4 a). Likewise, GPTZero demonstrated that the perplexity scores of ChatGPT-generated (p<0.001) and AI-rephrased (p<0.001) texts were significantly lower than those of the original articles (Fig. 4 b) . The ROC curve of GPTZero perplexity scores for identifying original articles and AI articles showed that the respective AUROC were 0.312 (Fig. 3 b).

figure 4

Artificial intelligence (AI)-generated articles demonstrated reduced AI scores after rephrasing. a The mean AI scores of 50 ChatGPT-generated articles before and after rephrasing; b ChatGPT-generated articles demonstrated lower Perplexity scores computed by GPTZero as compared to original articles although increased after rephrasing; * p  < 0·05, ** p  < 0·01, *** p  < 0·001

The accuracy of reviewers in identifying AI-rephrased articles

The median time spent by the four reviewers to distinguish original and AI-rephrased articles was 5 minutes (min) 45 seconds (s) (interquartile range [IQR] 3 min 42 s, 9 min 7 s). The median time taken by each reviewer to distinguish original and AI-rephrased articles is shown in Appendix 6 . The two professorial reviewers demonstrated extremely high accuracy (96 and 100%) in discerning AI-rephrased articles, although both misclassified 12% of human-written articles as AI-rephrased (Fig. 2 c and d , and Table 1 ). Although three original articles were misclassified as AI-rephrased by both professorial reviewers, they were correctly identified by Originality and ZeroGPT. The common reasons for an article to be classified as AI-rephrased by reviewers included ‘incoherence’ (34.36%), ‘grammatical errors’ (20.26%), ‘insufficient evidence-based claims’ (16.15%), vocabulary diversity (11.79%), creativity (6.15%) , ‘misuse of abbreviations’(5.87%), ‘writing style’ (2.71%), ‘vague expression’ (1.81%), and ‘conflicting data’ (0.9%). Nevertheless, 12 of the 50 original articles were wrongly considered AI-rephrased by two or more reviewers. Most of these misclassified articles were deemed to be incoherent and/or lack vocabulary diversity. The frequency of the primary reason given by each reviewer and the frequency of the reasons given by four reviewers for identifying AI-rephrased articles are shown in Fig. 5 a and b, respectively.

figure 5

A The frequency of the primary reason for artificial intelligence (AI)-rephrased articles being identified by each reviewer. B The relative frequency of each reason for AI-rephrased articles being identified (based on the top three reasons given by the four reviewers)

Regarding the inter-rater agreement between two professorial reviewers, there was near-perfect agreement in the binary responses, with κ = 0.819 (95% confidence interval [CI] 0.705, 0.933, p <0.05), as well as fair agreements in the primary and second reasons, with κ = 0.211 (95% CI 0.011, 0.411, p <0.05) and κ = 0.216 (95% CI 0.024, 0.408, p <0.05), respectively.

“Plagiarized” scores of ChatGPT-generated or AI-rephrased articles

Turnitin results showed that the content of ChatGPT-generated and AI-rephrased articles had significantly lower ‘plagiarized’ scores (39.22% ± 10.6 and 23.16% ± 8.54%, respectively) than the original articles (99.06% ± 1.27%).

Likelihood of ChatGPT being used in original articles after the launch of GPT-3.5-Turbo

No significant differences were found in the AI scores or perplexity scores calculated by the six AI content detectors (p>0.05), or in the binary responses evaluated by reviewers ( p >0.05), when comparing the included original papers published before and after November 2022 (the release of ChatGPT).

Our study found that Originality.ai and ZeroGPT accurately detected AI-generated texts, regardless of whether they were rephrased or not. Additionally, Turnitin did not misclassify human-written articles. While professorial reviewers were generally able to discern AI-rephrased articles from human-written ones, they might misinterpret some human-written articles as AI-generated due to incoherent content and varied vocabulary. Conversely, AI-rephrased articles are more likely to go unnoticed by student reviewers.

What is the performance of generative AI in academic writing?

Lee et al found that sentences written by GPT-3 tended to generate fewer grammatical or spelling errors than human writers (Lee 2022 ). However, ChatGPT may not necessarily minimize grammatical mistakes. In our study, reviewers identified ‘grammatical errors’ as the second most common reason for classifying an article as AI-rephrased. Our reviewers also noted that generative AI was more likely to inappropriately use medical terminologies or abbreviations, and even generate fabricated data. These might lead to a detrimental impact on academic dissemination. Collectively, generative AI is less likely to successfully create credible academic articles without the development of discipline-specific LLMs.

Can generative AI generate creative and in-depth thoughts?

Prior research reported that ChatGPT correctly answered 42.0 to 67.6% of questions in medical licensing examinations conducted in China, Taiwan, and the USA (Zong 2023 ; Wang 2023 ; Gilson 2023 ). However, our reviewers discovered that AI-generated articles offered only superficial discussion without substantial supporting evidence. Further, redundancy was observed in the content of AI-generated articles. Unless future advancements in generative AI can improve the interpretation of evidence-based content and incorporate in-depth and insightful discussion, its utility may be limited to serving as an information source for academic works.

Who can be deceived by ChatGPT? How can we address it?

ChatGPT is capable of creating realistic and intelligent-sounding text, including convincing data and references (Ariyaratne et al. 2023 ). Yeadon et al , found that ChatGPT-generated physics essays were graded as first-class essays in a writing assessment at Durham University (Yeadon et al. 2023 ). We found that AI-generated content had a relatively low plagiarism rate. These factors may encourage the potential misuse of AI technology for generating written assignments and the dissemination of misinformation among students. In a current survey, Welding ( 2023 ) reported that 50% of 1000 college students admitted to using AI tools to help complete assignments or exams. However, in our study, college student reviewers only correctly identified an average of 76% of AI-rephrased articles. Notably, our professorial reviewers found fabricated data in two AI-generated articles, while the student reviewers were unaware of this issue, which highlights the possibility of AI-generated content deceiving junior researchers and impacting their learning. In short, the inherent limitations of ChatGPT as reported by experienced reviewers may help research students understand some key points in critically appraising academic articles and be more competent in detecting AI-generated articles.

Which detectors are recommended for use?

Our study revealed that Originality.ai emerged as the most sensitive and accurate platform for detecting AI-generated (including paraphrased) content, although it requires a subscription fee. ZeroGPT is an excellent free tool that exhibits a high level of sensitivity and specificity for detecting AI articles when the AI score threshold is set at 42.45%. These findings could help monitor the AI use in academic publishing and education, to promisingly tackle ethical challenges posed by the iteration of AI technologies. Additionally, Turnitin, a widely used platform in educational institutions or scientific journals, displayed perfect accuracy in detecting human-written articles and near-perfect accuracy in detecting ChatGPT-generated content but was proved susceptible to deception when confronted with AI-rephrased articles. This raises concerns among educators regarding the potential for students to evade Turnitin AI detection by using an AI rephrasing editor. As generative AI technologies continue to evolve, educators and researchers should regularly conduct similar studies to identify the most suitable AI content detectors.

AI content detectors employ different predictive algorithms. Some publicly available detectors use perplexity scores and related concepts for identifying AI-generated writing. However, we found that the AUROC curve of GPTZero perplexity scores in identifying AI articles performed worse than chance. As such, the effectiveness of utilizing perplexity-based methods as the machine learning algorithm for developing an AI content detector remains debatable.

As with any novel technology, some merits and demerits require continuous improvement and development. Currently, AI content detectors have been developed as general-purpose tools to analyze text features, primarily based on the randomness of word choice and sentence lengths (Prillaman 2023 ). While technical issues such as algorithms, model turning, and development are beyond the scope of this study, we have provided empirical evidence that offers potential directions for future advancements in AI content detectors. One such area that requires further exploration and investigation is the development of AI content detectors trained using discipline-specific LLMs.

Should authors be concerned about their manuscripts being misinterpreted?

While AI-rephrasing tools may help non-native English writers and less experienced researchers prepare better academic articles, AI technologies may pose challenges to academic publishing and education. Previous research has suggested that AI content detectors may penalize non-native English writers with limited linguistic expressions due to simplified wording (Liang et al. 2023 ). However, scientific writing emphasizes precision and accurate expression of scientific evidence, often favouring succinctness over vocabulary diversity or complex sentence structures (Scholar Hangout 2023 ). This raises concerns about the potential misclassification of human-written academic papers as AI-generated, which could have negative implications for authors’ academic reputations. However, our results indicate that experienced reviewers are unlikely to misclassify human-written manuscripts as AI-generated if the articles present logical arguments, provide sufficient evidence-based support, and offer in-depth discussions. Therefore, authors should consider these factors when preparing their manuscripts to minimize the risk of misinterpretation.

Our study revealed that both AI content detectors and human reviewers occasionally misclassified certain original articles as AI-generated. However, it is noteworthy that no human-written articles were misclassified by both AI-content detectors and the two professorial reviewers simultaneously. Therefore, to minimize the risk of misclassifying human-written articles as AI-generated, editors of peer-reviewed journals may consider implementing a screening process that involves a reliable, albeit imperfect, AI-content detector in conjunction with the traditional peer-review process, which includes at least two reviewers. If both the AI content detectors and the peer reviewers consistently label a manuscript as AI-generated, the authors should be given the opportunity to appeal the decision. The editor-in-chief and one member of the editorial board can then evaluate the appeal and make a final decision.

Limitations

This study had several limitations. Firstly, the ChatGPT-3.5 version was used to fabricate articles given its popularity. Future studies should investigate the performance of upgraded LLMs. Secondly, although our analyses revealed no significant differences in the proportion of original papers classified as AI-written before and after November 2022 (the release of ChatGPT), we cannot guarantee that all original papers were not assisted by generative AI in their writing process. Future studies should consider including papers published before this date to validate our findings. Thirdly, although an excellent inter-rater agreement in the binary score was found between the two professorial reviewers, our results need to be interpreted with caution given the small number of reviewers and the lack of consistency between the two student reviewers. Future studies should address these limitations and expand our methodology to include other disciplines/industries with more reviewers to enhance the generalizability of our findings and facilitate the development of strategies for detecting AI-generated content in various fields.

Conclusions

This is the first study to directly compare the accuracy of advanced AI detectors and human reviewers in detecting AI-generated medical writing after paraphrasing. Our findings substantiate that the established peer-reviewed system can effectively mitigate the risk of publishing AI-generated academic articles. However, certain AI content detectors (i.e., Originality.ai and ZeroGPT) can be used to help editors or reviewers with the initial screening of AI-generated articles, upholding academic integrity in scientific publishing. It is noteworthy that the current version of ChatGPT is inadequate to generate rigorous scientific articles and carries the risk of fabricating data and misusing medical abbreviations. Continuous development of machine-learning strategies to improve AI detection accuracy in the health sciences field is essential. This study provides empirical evidence and valuable insights for future research on the validation and development of effective detection tools. It highlights the importance of implementing proper supervision and regulation of AI usage in medical writing and publishing. This ensures that relevant stakeholders can responsibly harness AI technologies while maintaining scientific rigour.

Availability of data and materials

The data and materials used in the manuscript are available upon reasonable request to the corresponding author.

Abbreviations

  • Artificial intelligence

Large language model

Chat Generative Pre-trained Transformer

Receiver Operating Characteristic

Area under the Receiver Operating Characteristic

Anderson N, Belavy DL, Perle SM, Hendricks S, Hespanhol L, Verhagen E, Memon AR (2023) AI did not write this manuscript, or did it? Can we trick the AI text detector into generating texts? The potential future of ChatGPT and AI in sports & exercise medicine manuscript generation. BMJ Open Sport Exerc Med 9(1):e001568

Article   Google Scholar  

Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R (2023) A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 52:1755–1758

ChatGPT Statistics, 2023, Detailed Insights On Users. https://www.demandsage.com/chatgpt-statistics/ Accessed 08 Nov 2023

Crothers E, Japkowicz N, Viktor HL (2023) Machine-generated text: a comprehensive survey of threat models and detection methods. IEEE Access

Google Scholar  

Fisher JS, Radvansky GA (2018) Patterns of forgetting. J Mem Lang 102:130–141

Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, Pearson AT (2023) Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med 6:75

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D (2023) How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312

GPTZero, 2023 , How do I interpret burstiness or perplexity? https://support.gptzero.me/hc/en-us/articles/15130070230551-How-do-I-interpret-burstiness-or-perplexity. Accessed August 20 2023

Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ (2023) Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr 7:pkad010

Imran M, Almusharraf N (2023) Analyzing the role of ChatGPT as a writing assistant at higher education level: a systematic review of the literature. Contemp Educ Technol 15:ep464

Lee M, Liang P, Yang Q (2022) Coauthor: designing a human-ai collaborative writing dataset for exploring language model capabilities. In: CHI Conference on Human Factors in Computing Systems, 1–19 ACM, April 2022

Liang W, Yuksekgonul M, Mao Y, Wu E, Zou J (2023) GPT detectors are biased against non-native English writers. Patterns (N Y) 4(7):100779

Manohar N, Prasad SS (2023) Use of ChatGPT in academic publishing: a rare case of seronegative systemic lupus erythematosus in a patient with HIV infection. Cureus 15(2):e34616

Mehnen L, Gruarin S, Vasileva M, Knapp B (2023) ChatGPT as a medical doctor? A diagnostic accuracy study on common and rare diseases medRxiv. https://doi.org/10.1101/2023.04.20.23288859

OpenAI, 2023, Introducing ChatGPT. https://openai.com/blog/chatgpt .Accessed 30 Dec 2023

Patel SB, Lam K (2023) ChatGPT: the future of discharge summaries? Lancet Digital Health 5:e107–e108

Prillaman M (2023) ChatGPT detector' catches AI-generated papers with unprecedented accuracy. Nature. https://doi.org/10.1038/d41586-023-03479-4 Accessed 31 Dec 2023

Sadasivan V, Kumar A, Balasubramanian S, Wang W, Feizi S (2023) Can AI-generated text be reliably detected? arXiv e-prints: 2303.11156

Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare MDPI 887:1

Scholar Hangout, 2023, https://www.manuscriptedit.com/scholar-hangout/maintaining-accuracy-in-academic-writing/ .Accessed September 10 2023

Sinha RK, Deb Roy A, Kumar N, Mondal H (2023) Applicability of ChatGPT in assisting to solve higher order problems in pathology. Cureus 15(2):e35237

Stokel-Walker C (2023) ChatGPT listed as author on research papers: many scientists disapprove. Nature 613(7945):620–621

Top 10 AI Detector Tools, 2023, You Should Use. https://www.eweek.com/artificial-intelligence/ai-detector-software/#chart .Accessed August 2023

Walters WH (2023) The effectiveness of software designed to detect AI-generated writing: a comparison of 16 AI text detectors. Open Information Science 7:20220158

Wang Y-M, Shen H-W, Chen T-J (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 10:1097

Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J, Popoola O, Šigut P, Waddington L (2023) Testing of detection tools for AI-generated text. Int J Educ Integrity 19(1):26

Welding L (2023) Half of college students say using AI on schoolwork is cheating or plagiarism. Best Colleges

Wordtune. 2023, https://app.wordtune.com/ .Accessed 16 July 2023

Yeadon W, Inyang O-O, Mizouri A, Peach A, Testrow CP (2023) The death of the short-form physics essay in the coming AI revolution. Phys Educ 58:035027

Zong H, Li J, Wu E, Wu R, Lu J, Shen B (2023) Performance of ChatGPT on Chinese National Medical Licensing Examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. medRxiv:2023.2007. 2009.23292415

Download references

Acknowledgements

Not applicable.

The current study was supported by the GP Batteries Industrial Safety Trust Fund (R-ZDDR).

Author information

Authors and affiliations.

Department of Rehabilitation Science, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Jae Q. J. Liu, Kelvin T. K. Hui, Fadi Al Zoubi, Zing Z. X. Zhou, Curtis C. H. Yu, Jeremy R. Chang & Arnold Y. L. Wong

Department of Orthopedic Surgery, Rush University Medical Center, Chicago, IL, USA

Dino Samartzis

You can also search for this author in PubMed   Google Scholar

Contributions

Jae QJ Liu, Kelvin TK Hui and Arnold YL Wong conceptualized the study; Fadi Al Zoubi, Zing Z.X. Zhou, Curtis CH Yu, and Arnold YL Wong acquired the data; Jae QJ Liu and Kelvin TK Hui curated the data; Jae QJ Liu and Jeremy R Chang analyzed the data; Arnold YL Wong was responsible for funding acquisition and project supervision; Jae QJ Liu drafted the original manuscript; Arnold YL Wong and Dino Samartzis edited the manuscript.

Corresponding author

Correspondence to Arnold Y. L. Wong .

Ethics declarations

Competing interests.

All authors declare no conflicts of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Liu, J.Q.J., Hui, K.T.K., Al Zoubi, F. et al. The great detectives: humans versus AI detectors in catching large language model-generated medical writing. Int J Educ Integr 20 , 8 (2024). https://doi.org/10.1007/s40979-024-00155-6

Download citation

Received : 27 December 2023

Accepted : 13 March 2024

Published : 20 May 2024

DOI : https://doi.org/10.1007/s40979-024-00155-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Paraphrasing tools
  • Generative AI
  • Academic integrity
  • AI content detectors
  • Peer review
  • Perplexity scores
  • Scientific rigour

International Journal for Educational Integrity

ISSN: 1833-2595

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

peer reviewed journal articles on research methods

  • Open access
  • Published: 15 May 2024

Causal relationship of interleukin-6 and its receptor on sarcopenia traits using mendelian randomization

  • Baixing Chen 1 ,
  • Shaoshuo Li 2 ,
  • Shi Lin 3 &
  • Hang Dong 3 , 4  

Nutrition Journal volume  23 , Article number:  51 ( 2024 ) Cite this article

218 Accesses

1 Altmetric

Metrics details

Previous research has extensively examined the role of interleukin 6 (IL-6) in sarcopenia. However, the presence of a causal relationship between IL-6, its receptor (IL-6R), and sarcopenia remains unclear.

In this study, we utilized summary-level data from genome-wide association studies (GWAS) focused on appendicular lean mass (ALM), hand grip strength, and walking pace. Single nucleotide polymorphisms (SNPs) were employed as genetic instruments for IL-6 and IL-6R to estimate the causal effect of sarcopenia traits. We adopted the Mendelian randomization (MR) approach to investigate these associations using the inverse variance weighted (IVW) method as the primary analytical approach. Additionally, we performed sensitivity analyses to validate the reliability of the MR results.

This study revealed a significant negative association between main IL-6R and eQTL IL-6R on the left grip strength were − 0.013 (SE = 0.004, p  < 0.001) and -0.029 (SE = 0.007, p  < 0.001), respectively. While for the right grip strength, the estimates were − 0.011 (SE = 0.001, p  < 0.001) and − 0.021 (SE = 0.008, p  = 0.005). However, no evidence of an association for IL-6R with ALM and walking pace. In addition, IL-6 did not affect sarcopenia traits.

Our study findings suggest a negative association between IL-6R and hand grip strength. Additionally, targeting IL-6R may hold potential value as a therapeutic approach for the treatment of hand grip-related issues.

Peer Review reports

Introduction

Sarcopenia is a syndrome characterized by a loss of muscle mass and strength, which often leads to functional impairment and adverse outcomes [ 1 ]. This condition involves a restructuring of the overall muscle composition, including the transformation of muscle fibers and the infiltration of lipids [ 2 ]. As a result, muscle power is diminished, and the risk of falls, mortality, disability, and hospitalization is elevated compared to individuals without sarcopenia [ 3 ].

Extensive research has demonstrated that interleukins, particularly interleukin-6 (IL-6), play a critical role in the development of skeletal muscle wasting [ 4 ]. They achieve this by activating molecular pathways that disrupt the balance between protein synthesis and catabolism [ 5 ]. For example, IL-6 has catabolic effects on muscle proteins [ 6 ]. In clinical settings, geriatric patients with acute infection-induced inflammation who received piroxicam, a non steroidal anti-inflammatory drug, showed a decreased level of IL-6 with better muscle performance. Several systematic reviews have consistently shown that elevated levels of inflammatory cytokines are inversely correlated with muscle strength and mass [ 7 , 8 , 9 ]. On the other hand, therapeutic strategies that IL-6R antagonists, such as the use of tocilizumab, have shown promise in increasing muscle mass. IL-6 promotes inflammatory responses via the membrane-bound or circulating soluble IL-6R. Two studies demonstrated that administration of anti-mouse IL-6 receptor antibodies improved muscle mass in mice [ 10 , 11 ]. In a study by Tournadre et al., the use of tocilizumab was shown to have beneficial effects on lean mass without significant changes in fat mass in patients with rheumatoid arthritis [ 12 ]. IL-6 and sIL-6R are important factors in the regulation of inflammation, but their effects on muscle mass and function suggest that the relationship for IL-6 and sIL-6R with sarcopenia is complex and may be mediated by other factors. Further research is needed to better understand the mechanisms underlying the association.

Mendelian randomization (MR) is an analytical approach that utilizes germline genetic markers as instrumental variables to assess potential risk factors [ 13 ]. The random allocation of genetic variants from parents to offspring during gametogenesis provides protection against confounding factors typically encountered in observational studies and helps to mitigate issues of reverse causation [ 14 ]. The growing availability of genetic association data for various traits and diseases has significantly enhanced the utility of MR methods for establishing reliable causal inferences. In particular, the inclusion of genome-wide association study (GWAS) data from large-scale consortia has the potential to enhance the statistical power of MR analyses for detecting causal effects [ 15 ]. MR provide a causal inference approach to establish whether the association between IL-6 or sIL6 and sarcopenia is causal or merely a correlation. Therefore, the aim of this study was to comprehensively investigate the causal relation for IL-6 or sIL6 with sarcopenia using a two-sample Mendelian randomization approach, and to determine whether targeting these factors may be a viable approach to preventing or treating muscle loss.

Study design

This two-sample Mendelian randomization (MR) study design utilized summary-level data and relied on three key assumptions. In our analysis, we ensured that the instrumental variables (genetic variants) used in the study satisfied three key assumptions. Firstly, these genetic variants showed a strong association with the exposure of interest, as demonstrated by their genome-wide association study p -value being less than 5 × 10 − 8 , thus fulfilling the relevance assumption. Secondly, we confirmed that these genetic variants were independent of potential confounders that could influence the relationship between the exposure and outcome, fulfilling the independence assumption. Lastly, we assumed that these genetic variants only affected the outcome through the exposure of interest and not through any other causal pathways, adhering to the exclusion restriction assumption. These assumptions are crucial for valid MR analysis and help ensure the reliability of our results. In short, the MR analysis required that the genetic variants meet these assumptions in order to provide valid causal inferences [ 16 ].

Data sources of exposure

Table S1 presented an overview of the data sources used in this study, including sample sizes and characteristics of the GWAS data sources. The interleukin GWAS data was obtained from Folkersen et al. [ 17 ], and adjusted for covariates by accounting for population structure and study-specific parameters. To fulfill the first assumption of MR, we included all single nucleotide polymorphisms (SNPs) that strongly and independently (R 2  < 0.001) predicted the exposures of interest at genome-wide significance ( P  < 5 × 10 –8 ). We also took steps to remove SNPs that showed potential pleiotropic effects, which could introduce confounding in the analysis [ 18 ]. To confirm our hypothesis and enhance the integrity of our findings, specific genetic instruments for IL-6 and IL-6R were crafted. Genetic variants were selected within a 150 kb region around the genes encoding IL-6 (ENSG00000136244) and IL-6R (ENSG00000160712), establishing distinct genetic tools aimed at directly influencing these target genes [ 19 ]. To assess the strength of the selected genetic predictors for interleukins, we excluded SNPs with an F-statistic below 10. The mean value of the remaining SNPs was then calculated to obtain the estimated F-statistic for each exposure factor. This approach helps evaluate the robustness and reliability of the genetic predictors in our analysis [ 20 ].

Outcomes in GWAS: sarcopenia traits

Summary statistics for sarcopenia traits were extracted from the Pie et al. GWAS [ 21 ] and UK Biobank (Neale Lab) [ 22 ]. Sarcopenia traits included appendicular lean mass (ALM), hand grip strength and walking pace, as followed: ALM ( n  = 450,243), left grip strength left( n  = 335,821), right grip strength right ( n  = 335, 842) and walking pace ( n  = 335,349). The ALM data utilized in this study were obtained from the UK Biobank, employing bioelectric impedance analysis (BIA) to assess the combined muscle mass of the arms and legs [ 23 ]. Grip strength, which demonstrated a moderate correlation with overall body strength, was selected as a dependable surrogate for measuring whole-body strength [ 24 ]. Walking pace, known for its speed, safety, and high reliability, was widely adopted as a practical test for sarcopenia in clinical settings [ 24 ].

MR analysis

In this study, the inverse variance weighted (IVW) method was utilized to assess the causal effects and to evaluate the bi-directional relation between interleukin and sarcopenia traits. To obtain a pooled causal estimate, this method utilizes the Wald ratio of each SNPs on the outcome. In our analysis, we applied a Bonferroni correction by setting a threshold of P  < 0.0125 (α = 0.05/4 outcomes) to account for multiple comparisons and maintain a stringent level of statistical significance. Nevertheless, if the instrumental variables (IVs) violate the assumption of “no horizontal pleiotropy,” the estimated results obtained through the IVW method may be biased. To account for this potential bias, we performed sensitivity analyses using two additional MR methods. Firstly, we applied the weighted median (WM) method developed by Bowden et al. [ 25 ]. This method is robust and produces consistent causal estimates even when more than 50% of the IVs are valid. In addition, we employed MR-Egger regression to assess the presence of unbalanced pleiotropy and significant heterogeneity. It is worth noting that this method typically requires a larger sample size to achieve the same level of precision in estimating the underexposure variatio [ 26 ].

Sensitivity analysis

Horizontal pleiotropy can be a potential issue in MR studies, as it can lead to biased estimates of causal effects. To ensure the robustness of our findings, we conducted additional analyses to detect the presence of pleiotropy and heterogeneity. To evaluate heterogeneity, we employed Cochran’s Q statistic, where a significant result is indicated by a p -value less than 0.05. This statistical test allows us to determine if there is substantial heterogeneity among the studies included in our analysis [ 27 ]. Additionally, we employed the MR-Egger regression to appraise horizontal pleiotropy, and considered a P value less than 0.05 to be significant [ 28 ]. If the IVW method result was significant ( P  < 0.05), even if other methods did not show significant results and no pleiotropy or heterogeneity was identified, we considered it as a positive result, as long as the beta values of other methods were in the same direction. All analyses were conducted using R (version 4.1.1) and the R packages “TwosampleMR” [ 29 ].

The effect of interleukin on sarcopenia traits

In this study, two and three SNPs were available for main IL-6 instrument and eQTL IL-6 instrument, respectively. Seven and eight SNPs were available for main IL-6R instrument and eQTL IL-6R instrument, respectively, all with a genome-wide significance ( p  < 5 × 10 –8 ) (Table S2 ). No causal relationship between IL-6 and sarcopenia traits was observed. Given that there are only two and three SNPs for IL6, a p -value of 5 × 10 –5 was set as the threshold to include more SNPs. However, the same SNPs were identified using this threshold. In IVW analysis, a significant negative associations between IL-6R levels and grip strength was observed. The effect estimates for main IL-6R and eQTL IL-6R on the left grip strength were − 0.013 (SE = 0.004, p  < 0.001) and -0.029 (SE = 0.007, p  < 0.001), while forn the right grip strength, the estimates were − 0.011 (SE = 0.001, p  < 0.001) and − 0.021 (SE = 0.008, p  = 0.005), as presented in Table  1 . However, no causal association for IL-6R with ASM or walking pace was observed. To assess the robustness of our findings, we performed several sensitivity analyses, including the Cochran’s Q test and the MR-Egger intercept test (Table  2 ). However, these methods were not applicable for main IL-6 due to the limited number of instrumental variables (IVs) available for this analysis.

As IL6 is also a myokine secreted by muscle cells, a reverse MR analysis was performed to identify the effect of sarcopenia traits on interleukin. The results of this analysis, as presented in Table S3 , indicated that there were no statistically significant causal effects of these sarcopenia traits on IL-6 or IL-6R. Furthermore, Cochran’s Q tests, detailed in Table S4 , revealed P -values greater than 0.05, suggesting an absence of heterogeneity among the instrumental variables used in our analysis. These findings suggest that, within the limits of our study design and the genetic instruments employed, sarcopenia traits do not exert a detectable causal influence on IL-6 or IL-6R levels.

This MR study demonstrated a significant and negative association between IL-6R and hand grip strength. However, no evidence of an association between IL-6 and sarcopenia traits was found. These findings offer new insights into the impact of interleukins on hand grip strength.

Immune aging is closely associated with the development of sarcopenia, leading to the loss of skeletal muscle mass and function [ 30 ]. Inflammatory parameters have been shown to be inversely related to hand grip strength [ 31 ], suggesting a potential role of the immune system in skeletal muscle protein metabolism during aging [ 32 ]. IL-6, as one of the inflammatory factors, plays a crucial role in modulating muscle anabolism or catabolism in response to tissue damage or infection. Previous meta-analyses have consistently demonstrated a negative association between IL-6 and muscle strength and mass. However, it is important to consider the potential influence of reverse causation bias or confounding factors in these associations [ 9 ]. Prolonged exposure to IL-6 has been associated with the promotion of muscle atrophy through the suppression of muscle anabolism and energy homeostasis, as well as the direct induction of muscle catabolism [ 33 ]. During the development of sarcopenia, there is an increase in the secretion of inflammatory factors, contributing to a low-grade inflammatory state. Moreover, skeletal muscle mass and function decline with age, accompanied by a decrease in the synthesis and secretion of myogenic inflammatory factors, which disrupts skeletal muscle energy metabolism [ 34 ]. However, no relationship between IL-6 and sarcopenia traits was observed in this study. This inconsistency may be due to previous studies that found the associations through observational studies, which are prone to confounding and reverse causation. In contrast, MR analysis can help to overcome these limitations by using genetic variants that are less likely to be influenced by confounding factors [ 35 ]. It is also possible that IL-6R has a different biological function and effect on muscle strength compared to IL-6. This may explain why our MR analysis did not find a significant association with sarcopenia traits for IL-6 but for IL-6R. Overall, our findings suggest that the previously reported association between IL-6 and muscle mass may not be causal and that IL-6R may have a direct effect on hand grip strength. However, further studies are needed to confirm these findings and to investigate the underlying biological mechanisms.

Furthermore, the methods used to measure muscle mass can vary between studies, and this can contribute to differences in findings. ALM is a commonly used measure of muscle mass in the context of sarcopenia, but it is not the only measure available [ 36 ]. Other measures, such as whole-body muscle mass or muscle cross-sectional area, may be more appropriate in certain contexts. Additionally, the way in which ALM is calculated (i.e., whether it is divided by height, weight, or BMI) can also impact the results.

Our study has several limitations that should be considered when interpreting our findings. First, our analysis was conducted using large-scale GWAS data, which may not represent the entire population. Additionally, the sample size for the IL-6 was relatively small, which may limit the generalizability of your findings. Second, While MR can help to address issues of confounding, there may be additional confounders that were not accounted for in our study. For example, factors such as diet, physical activity, or medication use may influence both interleukin levels and muscle mass. Third, we only focused on IL-6R and IL-6 and did not examine the full range of interleukins that may be relevant to muscle mass. Other interleukins may have different effects on muscle mass, and examining a broader range of interleukins may provide a more complete picture of their relationship with muscle mass.

This is the first study of which we are aware that investigates potential causal associations for IL-6 and IL-6R with sarcopenia traits using Mendelian Randomization. Our findings indicate that IL-6R is negatively associated with hand grip strength. Further research should aim to identify mechanisms of action and targeting IL-6R may be valuable for treatment.

Availability of supporting data

The datasets analyzed in this study are publicly available summary statistics.

Chung HY, Cesari M, Anton S, Marzetti E, Giovannini S, Seo AY, Carter C, Yu BP, Leeuwenburgh C. Molecular inflammation: underpinnings of aging and age-related diseases. Ageing Res Rev. 2009;8:18–30.

Article   CAS   PubMed   Google Scholar  

Lang T, Streeper T, Cawthon P, Baldwin K, Taaffe DR, Harris TB. Sarcopenia: etiology, clinical consequences, intervention, and assessment. Osteoporos Int. 2010;21:543–59.

Yeung SSY, Reijnierse EM, Pham VK, Trappenburg MC, Lim WK, Meskers CGM, Maier AB. Sarcopenia and its association with falls and fractures in older adults: a systematic review and meta-analysis. J Cachexia Sarcopenia Muscle. 2019;10:485–500.

Article   PubMed   PubMed Central   Google Scholar  

Picca A, Coelho-Junior HJ, Calvani R, Marzetti E, Vetrano DL. Biomarkers shared by frailty and sarcopenia in older adults: a systematic review and meta-analysis. Ageing Res Rev. 2022;73:101530.

Meng SJ, Yu LJ. Oxidative stress, molecular inflammation and sarcopenia. Int J Mol Sci. 2010;11:1509–26.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Krabbe KS, Pedersen M, Bruunsgaard H. Inflammatory mediators in the elderly. Exp Gerontol. 2004;39:687–99.

Bano G, Trevisan C, Carraro S, Solmi M, Luchini C, Stubbs B, Manzato E, Sergi G, Veronese N. Inflammation and sarcopenia: a systematic review and meta-analysis. Maturitas. 2017;96:10–5.

Article   PubMed   Google Scholar  

Miko A, Poto L, Matrai P, Hegyi P, Furedi N, Garami A, Illes A, Solymar M, Vincze A, Balasko M, et al. Gender difference in the effects of interleukin-6 on grip strength - a systematic review and meta-analysis. BMC Geriatr. 2018;18:107.

Tuttle CSL, Thang LAN, Maier AB. Markers of inflammation and their association with muscle strength and mass: a systematic review and meta-analysis. Ageing Res Rev. 2020;64:101185.

Tsujinaka T, Fujita J, Ebisui C, Yano M, Kominami E, Suzuki K, Tanaka K, Katsume A, Ohsugi Y, Shiozaki H, Monden M. Interleukin 6 receptor antibody inhibits muscle atrophy and modulates proteolytic systems in interleukin 6 transgenic mice. J Clin Invest. 1996;97:244–9.

Ando K, Takahashi F, Kato M, Kaneko N, Doi T, Ohe Y, Koizumi F, Nishio K, Takahashi K. Tocilizumab, a proposed therapy for the cachexia of Interleukin6-expressing lung cancer. PLoS ONE. 2014;9:e102436.

Tournadre A, Pereira B, Dutheil F, Giraud C, Courteix D, Sapin V, Frayssac T, Mathieu S, Malochet-Guinamand S, Soubrier M. Changes in body composition and metabolic profile during interleukin 6 inhibition in rheumatoid arthritis. J Cachexia Sarcopenia Muscle. 2017;8:639–46.

Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–65.

Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27:1133–63.

Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, Consortium E-I. Using published data in mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015;30:543–52.

Burgess S, Thompson SG. Mendelian randomization: methods for using genetic variants in causal estimation. CRC; 2015.

Folkersen L, Gustafsson S, Wang Q, Hansen DH, Hedman AK, Schork A, Page K, Zhernakova DV, Wu Y, Peters J, et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab. 2020;2:1135–48.

Bahls M, Leitzmann MF, Karch A, Teumer A, Dorr M, Felix SB, Meisinger C, Baumeister SE, Baurecht H. Physical activity, sedentary behavior and risk of coronary artery disease, myocardial infarction and ischemic stroke: a two-sample mendelian randomization study. Clin Res Cardiol. 2021;110:1564–73.

Cupido AJ, Asselbergs FW, Natarajan P, Group CIW, Ridker PM, Hovingh GK, Schmidt AF. Dissecting the IL-6 pathway in cardiometabolic disease: a mendelian randomization study on both IL6 and IL6R. Br J Clin Pharmacol. 2022;88:2875–84.

Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan NA, Thompson JR. Assessing the suitability of summary data for two-sample mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic. Int J Epidemiol. 2016;45:1961–74.

PubMed   PubMed Central   Google Scholar  

Pei YF, Liu YZ, Yang XL, Zhang H, Feng GJ, Wei XT, Zhang L. The genetic architecture of appendicular lean mass characterized by association analysis in the UK Biobank study. Commun Biol. 2020;3:608.

[ http://www.nealelab.is/uk-biobank /%0A].

Cox N. UK Biobank shares the promise of big data. Nature. 2018;562:194–5.

Cruz-Jentoft AJ, Bahat G, Bauer J, Boirie Y, Bruyere O, Cederholm T, Cooper C, Landi F, Rolland Y, Sayer AA, et al. Sarcopenia: revised European consensus on definition and diagnosis. Age Ageing. 2019;48:601.

Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in mendelian randomization with some Invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–14.

Chen X, Hong X, Gao W, Luo S, Cai J, Liu G, Huang Y. Causal relationship between physical activity, leisure sedentary behaviors and COVID-19 risk: a mendelian randomization study. J Transl Med. 2022;20:216.

Song J, Li A, Qian Y, Liu B, Lv L, Ye D, Sun X, Mao Y. Genetically predicted circulating levels of cytokines and the risk of Cancer. Front Immunol. 2022;13:886144.

Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–25.

Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 2018, 7.

Argiles JM, Busquets S, Stemmler B, Lopez-Soriano FJ. Cachexia and sarcopenia: mechanisms and potential targets for intervention. Curr Opin Pharmacol. 2015;22:100–6.

Visser M, Pahor M, Taaffe DR, Goodpaster BH, Simonsick EM, Newman AB, Nevitt M, Harris TB. Relationship of interleukin-6 and tumor necrosis factor-alpha with muscle mass and muscle strength in elderly men and women: the Health ABC Study. J Gerontol Biol Sci Med Sci. 2002;57:M326–332.

Article   Google Scholar  

Palla AR, Ravichandran M, Wang YX, Alexandrova L, Yang AV, Kraft P, Holbrook CA, Schurch CM, Ho ATV, Blau HM. Inhibition of prostaglandin-degrading enzyme 15-PGDH rejuvenates aged muscle mass and strength. Science 2021, 371.

Belizario JE, Fontes-Oliveira CC, Borges JP, Kashiabara JA, Vannier E. Skeletal muscle wasting and renewal: a pivotal role of myokine IL-6. Springerplus. 2016;5:619.

Zhang X, Li H, He M, Wang J, Wu Y, Li Y. Immune system and sarcopenia: presented relationship and future perspective. Exp Gerontol. 2022;164:111823.

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23:R89–98.

Cawthon PM. Assessment of lean Mass and physical performance in Sarcopenia. J Clin Densitom. 2015;18:467–71.

Download references

This work was funded by National Natural Science Foundation of China (82004390), National Studio Construction projects for the famous experts in Traditional Chinese Medicine (Huang Feng studio N75, 2022) and China Scholarship Council.

Author information

Authors and affiliations.

Department of Development and Regeneration, KU Leuven, Leuven, Belgium

Baixing Chen

Wuxi Affiliated Hospital of Nanjing University of Chinese Medicine, Wuxi, China

Shaoshuo Li

Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province, China

Shi Lin & Hang Dong

Department of traumatology, The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province, China

You can also search for this author in PubMed   Google Scholar

Contributions

Baixing Chen study conception and design, acquisition of data, analysis and interpretation of data, writing manuscript; Shaoshuo Li, critical revision of manuscript, analysis and interpretation of data; Shi Lin acquisition of data, critical revision of manuscript; Hang Dong study conception and design, drafting of manuscript, critical revision of manuscript.

Corresponding author

Correspondence to Hang Dong .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

No potential conflict of interest was reported by the authors.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Chen, B., Li, S., Lin, S. et al. Causal relationship of interleukin-6 and its receptor on sarcopenia traits using mendelian randomization. Nutr J 23 , 51 (2024). https://doi.org/10.1186/s12937-024-00958-w

Download citation

Received : 28 May 2023

Accepted : 07 May 2024

Published : 15 May 2024

DOI : https://doi.org/10.1186/s12937-024-00958-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Interleukin 6
  • IL-6 receptor
  • Genome-wide association studies
  • Mendelian randomization

Nutrition Journal

ISSN: 1475-2891

peer reviewed journal articles on research methods

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.25(3); 2014 Oct

Logo of ejifcc

Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

Jacalyn kelly.

1 Clinical Biochemistry, Department of Pediatric Laboratory Medicine, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

Tara Sadeghieh

Khosrow adeli.

2 Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada

3 Chair, Communications and Publications Division (CPD), International Federation for Sick Clinical Chemistry (IFCC), Milan, Italy

The authors declare no conflicts of interest regarding publication of this article.

Peer review has been defined as a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field. It functions to encourage authors to meet the accepted high standards of their discipline and to control the dissemination of research data to ensure that unwarranted claims, unacceptable interpretations or personal views are not published without prior expert review. Despite its wide-spread use by most journals, the peer review process has also been widely criticised due to the slowness of the process to publish new findings and due to perceived bias by the editors and/or reviewers. Within the scientific community, peer review has become an essential component of the academic writing process. It helps ensure that papers published in scientific journals answer meaningful research questions and draw accurate conclusions based on professionally executed experimentation. Submission of low quality manuscripts has become increasingly prevalent, and peer review acts as a filter to prevent this work from reaching the scientific community. The major advantage of a peer review process is that peer-reviewed articles provide a trusted form of scientific communication. Since scientific knowledge is cumulative and builds on itself, this trust is particularly important. Despite the positive impacts of peer review, critics argue that the peer review process stifles innovation in experimentation, and acts as a poor screen against plagiarism. Despite its downfalls, there has not yet been a foolproof system developed to take the place of peer review, however, researchers have been looking into electronic means of improving the peer review process. Unfortunately, the recent explosion in online only/electronic journals has led to mass publication of a large number of scientific articles with little or no peer review. This poses significant risk to advances in scientific knowledge and its future potential. The current article summarizes the peer review process, highlights the pros and cons associated with different types of peer review, and describes new methods for improving peer review.

WHAT IS PEER REVIEW AND WHAT IS ITS PURPOSE?

Peer Review is defined as “a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field” ( 1 ). Peer review is intended to serve two primary purposes. Firstly, it acts as a filter to ensure that only high quality research is published, especially in reputable journals, by determining the validity, significance and originality of the study. Secondly, peer review is intended to improve the quality of manuscripts that are deemed suitable for publication. Peer reviewers provide suggestions to authors on how to improve the quality of their manuscripts, and also identify any errors that need correcting before publication.

HISTORY OF PEER REVIEW

The concept of peer review was developed long before the scholarly journal. In fact, the peer review process is thought to have been used as a method of evaluating written work since ancient Greece ( 2 ). The peer review process was first described by a physician named Ishaq bin Ali al-Rahwi of Syria, who lived from 854-931 CE, in his book Ethics of the Physician ( 2 ). There, he stated that physicians must take notes describing the state of their patients’ medical conditions upon each visit. Following treatment, the notes were scrutinized by a local medical council to determine whether the physician had met the required standards of medical care. If the medical council deemed that the appropriate standards were not met, the physician in question could receive a lawsuit from the maltreated patient ( 2 ).

The invention of the printing press in 1453 allowed written documents to be distributed to the general public ( 3 ). At this time, it became more important to regulate the quality of the written material that became publicly available, and editing by peers increased in prevalence. In 1620, Francis Bacon wrote the work Novum Organum, where he described what eventually became known as the first universal method for generating and assessing new science ( 3 ). His work was instrumental in shaping the Scientific Method ( 3 ). In 1665, the French Journal des sçavans and the English Philosophical Transactions of the Royal Society were the first scientific journals to systematically publish research results ( 4 ). Philosophical Transactions of the Royal Society is thought to be the first journal to formalize the peer review process in 1665 ( 5 ), however, it is important to note that peer review was initially introduced to help editors decide which manuscripts to publish in their journals, and at that time it did not serve to ensure the validity of the research ( 6 ). It did not take long for the peer review process to evolve, and shortly thereafter papers were distributed to reviewers with the intent of authenticating the integrity of the research study before publication. The Royal Society of Edinburgh adhered to the following peer review process, published in their Medical Essays and Observations in 1731: “Memoirs sent by correspondence are distributed according to the subject matter to those members who are most versed in these matters. The report of their identity is not known to the author.” ( 7 ). The Royal Society of London adopted this review procedure in 1752 and developed the “Committee on Papers” to review manuscripts before they were published in Philosophical Transactions ( 6 ).

Peer review in the systematized and institutionalized form has developed immensely since the Second World War, at least partly due to the large increase in scientific research during this period ( 7 ). It is now used not only to ensure that a scientific manuscript is experimentally and ethically sound, but also to determine which papers sufficiently meet the journal’s standards of quality and originality before publication. Peer review is now standard practice by most credible scientific journals, and is an essential part of determining the credibility and quality of work submitted.

IMPACT OF THE PEER REVIEW PROCESS

Peer review has become the foundation of the scholarly publication system because it effectively subjects an author’s work to the scrutiny of other experts in the field. Thus, it encourages authors to strive to produce high quality research that will advance the field. Peer review also supports and maintains integrity and authenticity in the advancement of science. A scientific hypothesis or statement is generally not accepted by the academic community unless it has been published in a peer-reviewed journal ( 8 ). The Institute for Scientific Information ( ISI ) only considers journals that are peer-reviewed as candidates to receive Impact Factors. Peer review is a well-established process which has been a formal part of scientific communication for over 300 years.

OVERVIEW OF THE PEER REVIEW PROCESS

The peer review process begins when a scientist completes a research study and writes a manuscript that describes the purpose, experimental design, results, and conclusions of the study. The scientist then submits this paper to a suitable journal that specializes in a relevant research field, a step referred to as pre-submission. The editors of the journal will review the paper to ensure that the subject matter is in line with that of the journal, and that it fits with the editorial platform. Very few papers pass this initial evaluation. If the journal editors feel the paper sufficiently meets these requirements and is written by a credible source, they will send the paper to accomplished researchers in the field for a formal peer review. Peer reviewers are also known as referees (this process is summarized in Figure 1 ). The role of the editor is to select the most appropriate manuscripts for the journal, and to implement and monitor the peer review process. Editors must ensure that peer reviews are conducted fairly, and in an effective and timely manner. They must also ensure that there are no conflicts of interest involved in the peer review process.

An external file that holds a picture, illustration, etc.
Object name is ejifcc-25-227-g001.jpg

Overview of the review process

When a reviewer is provided with a paper, he or she reads it carefully and scrutinizes it to evaluate the validity of the science, the quality of the experimental design, and the appropriateness of the methods used. The reviewer also assesses the significance of the research, and judges whether the work will contribute to advancement in the field by evaluating the importance of the findings, and determining the originality of the research. Additionally, reviewers identify any scientific errors and references that are missing or incorrect. Peer reviewers give recommendations to the editor regarding whether the paper should be accepted, rejected, or improved before publication in the journal. The editor will mediate author-referee discussion in order to clarify the priority of certain referee requests, suggest areas that can be strengthened, and overrule reviewer recommendations that are beyond the study’s scope ( 9 ). If the paper is accepted, as per suggestion by the peer reviewer, the paper goes into the production stage, where it is tweaked and formatted by the editors, and finally published in the scientific journal. An overview of the review process is presented in Figure 1 .

WHO CONDUCTS REVIEWS?

Peer reviews are conducted by scientific experts with specialized knowledge on the content of the manuscript, as well as by scientists with a more general knowledge base. Peer reviewers can be anyone who has competence and expertise in the subject areas that the journal covers. Reviewers can range from young and up-and-coming researchers to old masters in the field. Often, the young reviewers are the most responsive and deliver the best quality reviews, though this is not always the case. On average, a reviewer will conduct approximately eight reviews per year, according to a study on peer review by the Publishing Research Consortium (PRC) ( 7 ). Journals will often have a pool of reviewers with diverse backgrounds to allow for many different perspectives. They will also keep a rather large reviewer bank, so that reviewers do not get burnt out, overwhelmed or time constrained from reviewing multiple articles simultaneously.

WHY DO REVIEWERS REVIEW?

Referees are typically not paid to conduct peer reviews and the process takes considerable effort, so the question is raised as to what incentive referees have to review at all. Some feel an academic duty to perform reviews, and are of the mentality that if their peers are expected to review their papers, then they should review the work of their peers as well. Reviewers may also have personal contacts with editors, and may want to assist as much as possible. Others review to keep up-to-date with the latest developments in their field, and reading new scientific papers is an effective way to do so. Some scientists use peer review as an opportunity to advance their own research as it stimulates new ideas and allows them to read about new experimental techniques. Other reviewers are keen on building associations with prestigious journals and editors and becoming part of their community, as sometimes reviewers who show dedication to the journal are later hired as editors. Some scientists see peer review as a chance to become aware of the latest research before their peers, and thus be first to develop new insights from the material. Finally, in terms of career development, peer reviewing can be desirable as it is often noted on one’s resume or CV. Many institutions consider a researcher’s involvement in peer review when assessing their performance for promotions ( 11 ). Peer reviewing can also be an effective way for a scientist to show their superiors that they are committed to their scientific field ( 5 ).

ARE REVIEWERS KEEN TO REVIEW?

A 2009 international survey of 4000 peer reviewers conducted by the charity Sense About Science at the British Science Festival at the University of Surrey, found that 90% of reviewers were keen to peer review ( 12 ). One third of respondents to the survey said they were happy to review up to five papers per year, and an additional one third of respondents were happy to review up to ten.

HOW LONG DOES IT TAKE TO REVIEW ONE PAPER?

On average, it takes approximately six hours to review one paper ( 12 ), however, this number may vary greatly depending on the content of the paper and the nature of the peer reviewer. One in every 100 participants in the “Sense About Science” survey claims to have taken more than 100 hours to review their last paper ( 12 ).

HOW TO DETERMINE IF A JOURNAL IS PEER REVIEWED

Ulrichsweb is a directory that provides information on over 300,000 periodicals, including information regarding which journals are peer reviewed ( 13 ). After logging into the system using an institutional login (eg. from the University of Toronto), search terms, journal titles or ISSN numbers can be entered into the search bar. The database provides the title, publisher, and country of origin of the journal, and indicates whether the journal is still actively publishing. The black book symbol (labelled ‘refereed’) reveals that the journal is peer reviewed.

THE EVALUATION CRITERIA FOR PEER REVIEW OF SCIENTIFIC PAPERS

As previously mentioned, when a reviewer receives a scientific manuscript, he/she will first determine if the subject matter is well suited for the content of the journal. The reviewer will then consider whether the research question is important and original, a process which may be aided by a literature scan of review articles.

Scientific papers submitted for peer review usually follow a specific structure that begins with the title, followed by the abstract, introduction, methodology, results, discussion, conclusions, and references. The title must be descriptive and include the concept and organism investigated, and potentially the variable manipulated and the systems used in the study. The peer reviewer evaluates if the title is descriptive enough, and ensures that it is clear and concise. A study by the National Association of Realtors (NAR) published by the Oxford University Press in 2006 indicated that the title of a manuscript plays a significant role in determining reader interest, as 72% of respondents said they could usually judge whether an article will be of interest to them based on the title and the author, while 13% of respondents claimed to always be able to do so ( 14 ).

The abstract is a summary of the paper, which briefly mentions the background or purpose, methods, key results, and major conclusions of the study. The peer reviewer assesses whether the abstract is sufficiently informative and if the content of the abstract is consistent with the rest of the paper. The NAR study indicated that 40% of respondents could determine whether an article would be of interest to them based on the abstract alone 60-80% of the time, while 32% could judge an article based on the abstract 80-100% of the time ( 14 ). This demonstrates that the abstract alone is often used to assess the value of an article.

The introduction of a scientific paper presents the research question in the context of what is already known about the topic, in order to identify why the question being studied is of interest to the scientific community, and what gap in knowledge the study aims to fill ( 15 ). The introduction identifies the study’s purpose and scope, briefly describes the general methods of investigation, and outlines the hypothesis and predictions ( 15 ). The peer reviewer determines whether the introduction provides sufficient background information on the research topic, and ensures that the research question and hypothesis are clearly identifiable.

The methods section describes the experimental procedures, and explains why each experiment was conducted. The methods section also includes the equipment and reagents used in the investigation. The methods section should be detailed enough that it can be used it to repeat the experiment ( 15 ). Methods are written in the past tense and in the active voice. The peer reviewer assesses whether the appropriate methods were used to answer the research question, and if they were written with sufficient detail. If information is missing from the methods section, it is the peer reviewer’s job to identify what details need to be added.

The results section is where the outcomes of the experiment and trends in the data are explained without judgement, bias or interpretation ( 15 ). This section can include statistical tests performed on the data, as well as figures and tables in addition to the text. The peer reviewer ensures that the results are described with sufficient detail, and determines their credibility. Reviewers also confirm that the text is consistent with the information presented in tables and figures, and that all figures and tables included are important and relevant ( 15 ). The peer reviewer will also make sure that table and figure captions are appropriate both contextually and in length, and that tables and figures present the data accurately.

The discussion section is where the data is analyzed. Here, the results are interpreted and related to past studies ( 15 ). The discussion describes the meaning and significance of the results in terms of the research question and hypothesis, and states whether the hypothesis was supported or rejected. This section may also provide possible explanations for unusual results and suggestions for future research ( 15 ). The discussion should end with a conclusions section that summarizes the major findings of the investigation. The peer reviewer determines whether the discussion is clear and focused, and whether the conclusions are an appropriate interpretation of the results. Reviewers also ensure that the discussion addresses the limitations of the study, any anomalies in the results, the relationship of the study to previous research, and the theoretical implications and practical applications of the study.

The references are found at the end of the paper, and list all of the information sources cited in the text to describe the background, methods, and/or interpret results. Depending on the citation method used, the references are listed in alphabetical order according to author last name, or numbered according to the order in which they appear in the paper. The peer reviewer ensures that references are used appropriately, cited accurately, formatted correctly, and that none are missing.

Finally, the peer reviewer determines whether the paper is clearly written and if the content seems logical. After thoroughly reading through the entire manuscript, they determine whether it meets the journal’s standards for publication,

and whether it falls within the top 25% of papers in its field ( 16 ) to determine priority for publication. An overview of what a peer reviewer looks for when evaluating a manuscript, in order of importance, is presented in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is ejifcc-25-227-g002.jpg

How a peer review evaluates a manuscript

To increase the chance of success in the peer review process, the author must ensure that the paper fully complies with the journal guidelines before submission. The author must also be open to criticism and suggested revisions, and learn from mistakes made in previous submissions.

ADVANTAGES AND DISADVANTAGES OF THE DIFFERENT TYPES OF PEER REVIEW

The peer review process is generally conducted in one of three ways: open review, single-blind review, or double-blind review. In an open review, both the author of the paper and the peer reviewer know one another’s identity. Alternatively, in single-blind review, the reviewer’s identity is kept private, but the author’s identity is revealed to the reviewer. In double-blind review, the identities of both the reviewer and author are kept anonymous. Open peer review is advantageous in that it prevents the reviewer from leaving malicious comments, being careless, or procrastinating completion of the review ( 2 ). It encourages reviewers to be open and honest without being disrespectful. Open reviewing also discourages plagiarism amongst authors ( 2 ). On the other hand, open peer review can also prevent reviewers from being honest for fear of developing bad rapport with the author. The reviewer may withhold or tone down their criticisms in order to be polite ( 2 ). This is especially true when younger reviewers are given a more esteemed author’s work, in which case the reviewer may be hesitant to provide criticism for fear that it will damper their relationship with a superior ( 2 ). According to the Sense About Science survey, editors find that completely open reviewing decreases the number of people willing to participate, and leads to reviews of little value ( 12 ). In the aforementioned study by the PRC, only 23% of authors surveyed had experience with open peer review ( 7 ).

Single-blind peer review is by far the most common. In the PRC study, 85% of authors surveyed had experience with single-blind peer review ( 7 ). This method is advantageous as the reviewer is more likely to provide honest feedback when their identity is concealed ( 2 ). This allows the reviewer to make independent decisions without the influence of the author ( 2 ). The main disadvantage of reviewer anonymity, however, is that reviewers who receive manuscripts on subjects similar to their own research may be tempted to delay completing the review in order to publish their own data first ( 2 ).

Double-blind peer review is advantageous as it prevents the reviewer from being biased against the author based on their country of origin or previous work ( 2 ). This allows the paper to be judged based on the quality of the content, rather than the reputation of the author. The Sense About Science survey indicates that 76% of researchers think double-blind peer review is a good idea ( 12 ), and the PRC survey indicates that 45% of authors have had experience with double-blind peer review ( 7 ). The disadvantage of double-blind peer review is that, especially in niche areas of research, it can sometimes be easy for the reviewer to determine the identity of the author based on writing style, subject matter or self-citation, and thus, impart bias ( 2 ).

Masking the author’s identity from peer reviewers, as is the case in double-blind review, is generally thought to minimize bias and maintain review quality. A study by Justice et al. in 1998 investigated whether masking author identity affected the quality of the review ( 17 ). One hundred and eighteen manuscripts were randomized; 26 were peer reviewed as normal, and 92 were moved into the ‘intervention’ arm, where editor quality assessments were completed for 77 manuscripts and author quality assessments were completed for 40 manuscripts ( 17 ). There was no perceived difference in quality between the masked and unmasked reviews. Additionally, the masking itself was often unsuccessful, especially with well-known authors ( 17 ). However, a previous study conducted by McNutt et al. had different results ( 18 ). In this case, blinding was successful 73% of the time, and they found that when author identity was masked, the quality of review was slightly higher ( 18 ). Although Justice et al. argued that this difference was too small to be consequential, their study targeted only biomedical journals, and the results cannot be generalized to journals of a different subject matter ( 17 ). Additionally, there were problems masking the identities of well-known authors, introducing a flaw in the methods. Regardless, Justice et al. concluded that masking author identity from reviewers may not improve review quality ( 17 ).

In addition to open, single-blind and double-blind peer review, there are two experimental forms of peer review. In some cases, following publication, papers may be subjected to post-publication peer review. As many papers are now published online, the scientific community has the opportunity to comment on these papers, engage in online discussions and post a formal review. For example, online publishers PLOS and BioMed Central have enabled scientists to post comments on published papers if they are registered users of the site ( 10 ). Philica is another journal launched with this experimental form of peer review. Only 8% of authors surveyed in the PRC study had experience with post-publication review ( 7 ). Another experimental form of peer review called Dynamic Peer Review has also emerged. Dynamic peer review is conducted on websites such as Naboj, which allow scientists to conduct peer reviews on articles in the preprint media ( 19 ). The peer review is conducted on repositories and is a continuous process, which allows the public to see both the article and the reviews as the article is being developed ( 19 ). Dynamic peer review helps prevent plagiarism as the scientific community will already be familiar with the work before the peer reviewed version appears in print ( 19 ). Dynamic review also reduces the time lag between manuscript submission and publishing. An example of a preprint server is the ‘arXiv’ developed by Paul Ginsparg in 1991, which is used primarily by physicists ( 19 ). These alternative forms of peer review are still un-established and experimental. Traditional peer review is time-tested and still highly utilized. All methods of peer review have their advantages and deficiencies, and all are prone to error.

PEER REVIEW OF OPEN ACCESS JOURNALS

Open access (OA) journals are becoming increasingly popular as they allow the potential for widespread distribution of publications in a timely manner ( 20 ). Nevertheless, there can be issues regarding the peer review process of open access journals. In a study published in Science in 2013, John Bohannon submitted 304 slightly different versions of a fictional scientific paper (written by a fake author, working out of a non-existent institution) to a selected group of OA journals. This study was performed in order to determine whether papers submitted to OA journals are properly reviewed before publication in comparison to subscription-based journals. The journals in this study were selected from the Directory of Open Access Journals (DOAJ) and Biall’s List, a list of journals which are potentially predatory, and all required a fee for publishing ( 21 ). Of the 304 journals, 157 accepted a fake paper, suggesting that acceptance was based on financial interest rather than the quality of article itself, while 98 journals promptly rejected the fakes ( 21 ). Although this study highlights useful information on the problems associated with lower quality publishers that do not have an effective peer review system in place, the article also generalizes the study results to all OA journals, which can be detrimental to the general perception of OA journals. There were two limitations of the study that made it impossible to accurately determine the relationship between peer review and OA journals: 1) there was no control group (subscription-based journals), and 2) the fake papers were sent to a non-randomized selection of journals, resulting in bias.

JOURNAL ACCEPTANCE RATES

Based on a recent survey, the average acceptance rate for papers submitted to scientific journals is about 50% ( 7 ). Twenty percent of the submitted manuscripts that are not accepted are rejected prior to review, and 30% are rejected following review ( 7 ). Of the 50% accepted, 41% are accepted with the condition of revision, while only 9% are accepted without the request for revision ( 7 ).

SATISFACTION WITH THE PEER REVIEW SYSTEM

Based on a recent survey by the PRC, 64% of academics are satisfied with the current system of peer review, and only 12% claimed to be ‘dissatisfied’ ( 7 ). The large majority, 85%, agreed with the statement that ‘scientific communication is greatly helped by peer review’ ( 7 ). There was a similarly high level of support (83%) for the idea that peer review ‘provides control in scientific communication’ ( 7 ).

HOW TO PEER REVIEW EFFECTIVELY

The following are ten tips on how to be an effective peer reviewer as indicated by Brian Lucey, an expert on the subject ( 22 ):

1) Be professional

Peer review is a mutual responsibility among fellow scientists, and scientists are expected, as part of the academic community, to take part in peer review. If one is to expect others to review their work, they should commit to reviewing the work of others as well, and put effort into it.

2) Be pleasant

If the paper is of low quality, suggest that it be rejected, but do not leave ad hominem comments. There is no benefit to being ruthless.

3) Read the invite

When emailing a scientist to ask them to conduct a peer review, the majority of journals will provide a link to either accept or reject. Do not respond to the email, respond to the link.

4) Be helpful

Suggest how the authors can overcome the shortcomings in their paper. A review should guide the author on what is good and what needs work from the reviewer’s perspective.

5) Be scientific

The peer reviewer plays the role of a scientific peer, not an editor for proofreading or decision-making. Don’t fill a review with comments on editorial and typographic issues. Instead, focus on adding value with scientific knowledge and commenting on the credibility of the research conducted and conclusions drawn. If the paper has a lot of typographical errors, suggest that it be professionally proof edited as part of the review.

6) Be timely

Stick to the timeline given when conducting a peer review. Editors track who is reviewing what and when and will know if someone is late on completing a review. It is important to be timely both out of respect for the journal and the author, as well as to not develop a reputation of being late for review deadlines.

7) Be realistic

The peer reviewer must be realistic about the work presented, the changes they suggest and their role. Peer reviewers may set the bar too high for the paper they are editing by proposing changes that are too ambitious and editors must override them.

8) Be empathetic

Ensure that the review is scientific, helpful and courteous. Be sensitive and respectful with word choice and tone in a review.

Remember that both specialists and generalists can provide valuable insight when peer reviewing. Editors will try to get both specialised and general reviewers for any particular paper to allow for different perspectives. If someone is asked to review, the editor has determined they have a valid and useful role to play, even if the paper is not in their area of expertise.

10) Be organised

A review requires structure and logical flow. A reviewer should proofread their review before submitting it for structural, grammatical and spelling errors as well as for clarity. Most publishers provide short guides on structuring a peer review on their website. Begin with an overview of the proposed improvements; then provide feedback on the paper structure, the quality of data sources and methods of investigation used, the logical flow of argument, and the validity of conclusions drawn. Then provide feedback on style, voice and lexical concerns, with suggestions on how to improve.

In addition, the American Physiology Society (APS) recommends in its Peer Review 101 Handout that peer reviewers should put themselves in both the editor’s and author’s shoes to ensure that they provide what both the editor and the author need and expect ( 11 ). To please the editor, the reviewer should ensure that the peer review is completed on time, and that it provides clear explanations to back up recommendations. To be helpful to the author, the reviewer must ensure that their feedback is constructive. It is suggested that the reviewer take time to think about the paper; they should read it once, wait at least a day, and then re-read it before writing the review ( 11 ). The APS also suggests that Graduate students and researchers pay attention to how peer reviewers edit their work, as well as to what edits they find helpful, in order to learn how to peer review effectively ( 11 ). Additionally, it is suggested that Graduate students practice reviewing by editing their peers’ papers and asking a faculty member for feedback on their efforts. It is recommended that young scientists offer to peer review as often as possible in order to become skilled at the process ( 11 ). The majority of students, fellows and trainees do not get formal training in peer review, but rather learn by observing their mentors. According to the APS, one acquires experience through networking and referrals, and should therefore try to strengthen relationships with journal editors by offering to review manuscripts ( 11 ). The APS also suggests that experienced reviewers provide constructive feedback to students and junior colleagues on their peer review efforts, and encourages them to peer review to demonstrate the importance of this process in improving science ( 11 ).

The peer reviewer should only comment on areas of the manuscript that they are knowledgeable about ( 23 ). If there is any section of the manuscript they feel they are not qualified to review, they should mention this in their comments and not provide further feedback on that section. The peer reviewer is not permitted to share any part of the manuscript with a colleague (even if they may be more knowledgeable in the subject matter) without first obtaining permission from the editor ( 23 ). If a peer reviewer comes across something they are unsure of in the paper, they can consult the literature to try and gain insight. It is important for scientists to remember that if a paper can be improved by the expertise of one of their colleagues, the journal must be informed of the colleague’s help, and approval must be obtained for their colleague to read the protected document. Additionally, the colleague must be identified in the confidential comments to the editor, in order to ensure that he/she is appropriately credited for any contributions ( 23 ). It is the job of the reviewer to make sure that the colleague assisting is aware of the confidentiality of the peer review process ( 23 ). Once the review is complete, the manuscript must be destroyed and cannot be saved electronically by the reviewers ( 23 ).

COMMON ERRORS IN SCIENTIFIC PAPERS

When performing a peer review, there are some common scientific errors to look out for. Most of these errors are violations of logic and common sense: these may include contradicting statements, unwarranted conclusions, suggestion of causation when there is only support for correlation, inappropriate extrapolation, circular reasoning, or pursuit of a trivial question ( 24 ). It is also common for authors to suggest that two variables are different because the effects of one variable are statistically significant while the effects of the other variable are not, rather than directly comparing the two variables ( 24 ). Authors sometimes oversee a confounding variable and do not control for it, or forget to include important details on how their experiments were controlled or the physical state of the organisms studied ( 24 ). Another common fault is the author’s failure to define terms or use words with precision, as these practices can mislead readers ( 24 ). Jargon and/or misused terms can be a serious problem in papers. Inaccurate statements about specific citations are also a common occurrence ( 24 ). Additionally, many studies produce knowledge that can be applied to areas of science outside the scope of the original study, therefore it is better for reviewers to look at the novelty of the idea, conclusions, data, and methodology, rather than scrutinize whether or not the paper answered the specific question at hand ( 24 ). Although it is important to recognize these points, when performing a review it is generally better practice for the peer reviewer to not focus on a checklist of things that could be wrong, but rather carefully identify the problems specific to each paper and continuously ask themselves if anything is missing ( 24 ). An extremely detailed description of how to conduct peer review effectively is presented in the paper How I Review an Original Scientific Article written by Frederic G. Hoppin, Jr. It can be accessed through the American Physiological Society website under the Peer Review Resources section.

CRITICISM OF PEER REVIEW

A major criticism of peer review is that there is little evidence that the process actually works, that it is actually an effective screen for good quality scientific work, and that it actually improves the quality of scientific literature. As a 2002 study published in the Journal of the American Medical Association concluded, ‘Editorial peer review, although widely used, is largely untested and its effects are uncertain’ ( 25 ). Critics also argue that peer review is not effective at detecting errors. Highlighting this point, an experiment by Godlee et al. published in the British Medical Journal (BMJ) inserted eight deliberate errors into a paper that was nearly ready for publication, and then sent the paper to 420 potential reviewers ( 7 ). Of the 420 reviewers that received the paper, 221 (53%) responded, the average number of errors spotted by reviewers was two, no reviewer spotted more than five errors, and 35 reviewers (16%) did not spot any.

Another criticism of peer review is that the process is not conducted thoroughly by scientific conferences with the goal of obtaining large numbers of submitted papers. Such conferences often accept any paper sent in, regardless of its credibility or the prevalence of errors, because the more papers they accept, the more money they can make from author registration fees ( 26 ). This misconduct was exposed in 2014 by three MIT graduate students by the names of Jeremy Stribling, Dan Aguayo and Maxwell Krohn, who developed a simple computer program called SCIgen that generates nonsense papers and presents them as scientific papers ( 26 ). Subsequently, a nonsense SCIgen paper submitted to a conference was promptly accepted. Nature recently reported that French researcher Cyril Labbé discovered that sixteen SCIgen nonsense papers had been used by the German academic publisher Springer ( 26 ). Over 100 nonsense papers generated by SCIgen were published by the US Institute of Electrical and Electronic Engineers (IEEE) ( 26 ). Both organisations have been working to remove the papers. Labbé developed a program to detect SCIgen papers and has made it freely available to ensure publishers and conference organizers do not accept nonsense work in the future. It is available at this link: http://scigendetect.on.imag.fr/main.php ( 26 ).

Additionally, peer review is often criticized for being unable to accurately detect plagiarism. However, many believe that detecting plagiarism cannot practically be included as a component of peer review. As explained by Alice Tuff, development manager at Sense About Science, ‘The vast majority of authors and reviewers think peer review should detect plagiarism (81%) but only a minority (38%) think it is capable. The academic time involved in detecting plagiarism through peer review would cause the system to grind to a halt’ ( 27 ). Publishing house Elsevier began developing electronic plagiarism tools with the help of journal editors in 2009 to help improve this issue ( 27 ).

It has also been argued that peer review has lowered research quality by limiting creativity amongst researchers. Proponents of this view claim that peer review has repressed scientists from pursuing innovative research ideas and bold research questions that have the potential to make major advances and paradigm shifts in the field, as they believe that this work will likely be rejected by their peers upon review ( 28 ). Indeed, in some cases peer review may result in rejection of innovative research, as some studies may not seem particularly strong initially, yet may be capable of yielding very interesting and useful developments when examined under different circumstances, or in the light of new information ( 28 ). Scientists that do not believe in peer review argue that the process stifles the development of ingenious ideas, and thus the release of fresh knowledge and new developments into the scientific community.

Another issue that peer review is criticized for, is that there are a limited number of people that are competent to conduct peer review compared to the vast number of papers that need reviewing. An enormous number of papers published (1.3 million papers in 23,750 journals in 2006), but the number of competent peer reviewers available could not have reviewed them all ( 29 ). Thus, people who lack the required expertise to analyze the quality of a research paper are conducting reviews, and weak papers are being accepted as a result. It is now possible to publish any paper in an obscure journal that claims to be peer-reviewed, though the paper or journal itself could be substandard ( 29 ). On a similar note, the US National Library of Medicine indexes 39 journals that specialize in alternative medicine, and though they all identify themselves as “peer-reviewed”, they rarely publish any high quality research ( 29 ). This highlights the fact that peer review of more controversial or specialized work is typically performed by people who are interested and hold similar views or opinions as the author, which can cause bias in their review. For instance, a paper on homeopathy is likely to be reviewed by fellow practicing homeopaths, and thus is likely to be accepted as credible, though other scientists may find the paper to be nonsense ( 29 ). In some cases, papers are initially published, but their credibility is challenged at a later date and they are subsequently retracted. Retraction Watch is a website dedicated to revealing papers that have been retracted after publishing, potentially due to improper peer review ( 30 ).

Additionally, despite its many positive outcomes, peer review is also criticized for being a delay to the dissemination of new knowledge into the scientific community, and as an unpaid-activity that takes scientists’ time away from activities that they would otherwise prioritize, such as research and teaching, for which they are paid ( 31 ). As described by Eva Amsen, Outreach Director for F1000Research, peer review was originally developed as a means of helping editors choose which papers to publish when journals had to limit the number of papers they could print in one issue ( 32 ). However, nowadays most journals are available online, either exclusively or in addition to print, and many journals have very limited printing runs ( 32 ). Since there are no longer page limits to journals, any good work can and should be published. Consequently, being selective for the purpose of saving space in a journal is no longer a valid excuse that peer reviewers can use to reject a paper ( 32 ). However, some reviewers have used this excuse when they have personal ulterior motives, such as getting their own research published first.

RECENT INITIATIVES TOWARDS IMPROVING PEER REVIEW

F1000Research was launched in January 2013 by Faculty of 1000 as an open access journal that immediately publishes papers (after an initial check to ensure that the paper is in fact produced by a scientist and has not been plagiarised), and then conducts transparent post-publication peer review ( 32 ). F1000Research aims to prevent delays in new science reaching the academic community that are caused by prolonged publication times ( 32 ). It also aims to make peer reviewing more fair by eliminating any anonymity, which prevents reviewers from delaying the completion of a review so they can publish their own similar work first ( 32 ). F1000Research offers completely open peer review, where everything is published, including the name of the reviewers, their review reports, and the editorial decision letters ( 32 ).

PeerJ was founded by Jason Hoyt and Peter Binfield in June 2012 as an open access, peer reviewed scholarly journal for the Biological and Medical Sciences ( 33 ). PeerJ selects articles to publish based only on scientific and methodological soundness, not on subjective determinants of ‘impact ’, ‘novelty’ or ‘interest’ ( 34 ). It works on a “lifetime publishing plan” model which charges scientists for publishing plans that give them lifetime rights to publish with PeerJ, rather than charging them per publication ( 34 ). PeerJ also encourages open peer review, and authors are given the option to post the full peer review history of their submission with their published article ( 34 ). PeerJ also offers a pre-print review service called PeerJ Pre-prints, in which paper drafts are reviewed before being sent to PeerJ to publish ( 34 ).

Rubriq is an independent peer review service designed by Shashi Mudunuri and Keith Collier to improve the peer review system ( 35 ). Rubriq is intended to decrease redundancy in the peer review process so that the time lost in redundant reviewing can be put back into research ( 35 ). According to Keith Collier, over 15 million hours are lost each year to redundant peer review, as papers get rejected from one journal and are subsequently submitted to a less prestigious journal where they are reviewed again ( 35 ). Authors often have to submit their manuscript to multiple journals, and are often rejected multiple times before they find the right match. This process could take months or even years ( 35 ). Rubriq makes peer review portable in order to help authors choose the journal that is best suited for their manuscript from the beginning, thus reducing the time before their paper is published ( 35 ). Rubriq operates under an author-pay model, in which the author pays a fee and their manuscript undergoes double-blind peer review by three expert academic reviewers using a standardized scorecard ( 35 ). The majority of the author’s fee goes towards a reviewer honorarium ( 35 ). The papers are also screened for plagiarism using iThenticate ( 35 ). Once the manuscript has been reviewed by the three experts, the most appropriate journal for submission is determined based on the topic and quality of the paper ( 35 ). The paper is returned to the author in 1-2 weeks with the Rubriq Report ( 35 ). The author can then submit their paper to the suggested journal with the Rubriq Report attached. The Rubriq Report will give the journal editors a much stronger incentive to consider the paper as it shows that three experts have recommended the paper to them ( 35 ). Rubriq also has its benefits for reviewers; the Rubriq scorecard gives structure to the peer review process, and thus makes it consistent and efficient, which decreases time and stress for the reviewer. Reviewers also receive feedback on their reviews and most significantly, they are compensated for their time ( 35 ). Journals also benefit, as they receive pre-screened papers, reducing the number of papers sent to their own reviewers, which often end up rejected ( 35 ). This can reduce reviewer fatigue, and allow only higher-quality articles to be sent to their peer reviewers ( 35 ).

According to Eva Amsen, peer review and scientific publishing are moving in a new direction, in which all papers will be posted online, and a post-publication peer review will take place that is independent of specific journal criteria and solely focused on improving paper quality ( 32 ). Journals will then choose papers that they find relevant based on the peer reviews and publish those papers as a collection ( 32 ). In this process, peer review and individual journals are uncoupled ( 32 ). In Keith Collier’s opinion, post-publication peer review is likely to become more prevalent as a complement to pre-publication peer review, but not as a replacement ( 35 ). Post-publication peer review will not serve to identify errors and fraud but will provide an additional measurement of impact ( 35 ). Collier also believes that as journals and publishers consolidate into larger systems, there will be stronger potential for “cascading” and shared peer review ( 35 ).

CONCLUDING REMARKS

Peer review has become fundamental in assisting editors in selecting credible, high quality, novel and interesting research papers to publish in scientific journals and to ensure the correction of any errors or issues present in submitted papers. Though the peer review process still has some flaws and deficiencies, a more suitable screening method for scientific papers has not yet been proposed or developed. Researchers have begun and must continue to look for means of addressing the current issues with peer review to ensure that it is a full-proof system that ensures only quality research papers are released into the scientific community.

IMAGES

  1. How to Find Peer-Reviewed Journal Articles in Discovery

    peer reviewed journal articles on research methods

  2. Peer-review research: Objections and obligations

    peer reviewed journal articles on research methods

  3. Peer Review Process

    peer reviewed journal articles on research methods

  4. What is Peer Review?

    peer reviewed journal articles on research methods

  5. Finding Peer Reviewed Journal Articles

    peer reviewed journal articles on research methods

  6. Writing A Peer Reviewed Article

    peer reviewed journal articles on research methods

VIDEO

  1. The scientific approach and alternative approaches to investigation

  2. Simplify journal articles & research papers with these ChatGPT prompts

  3. Methodological Reviews

  4. Pharma Pulse: Are peer reviewed medical journal articles reliable?

  5. How To Write A Journal Article Methods Section || The 3 step process to writing research methods

  6. Reading Peer Reviewed Articles

COMMENTS

  1. Reviewing the research methods literature: principles and strategies

    Overviews of methods are potentially useful means to increase clarity and enhance collective understanding of specific methods topics that may be characterized by ambiguity, inconsistency, or a lack of comprehensiveness. This type of review represents a distinct literature synthesis method, although to date, its methodology remains relatively undeveloped despite several aspects that demand ...

  2. Planning Qualitative Research: Design and Decision Making for New

    While many books and articles guide various qualitative research methods and analyses, there is currently no concise resource that explains and differentiates among the most common qualitative approaches. We believe novice qualitative researchers, students planning the design of a qualitative study or taking an introductory qualitative research course, and faculty teaching such courses can ...

  3. Quantitative and Qualitative Approaches to Generalization and

    Hence, mixed methods methodology does not provide a conceptual unification of the two approaches. Lacking a common methodological background, qualitative and quantitative research methodologies have developed rather distinct standards with regard to the aims and scope of empirical science (Freeman et al., 2007). These different standards affect ...

  4. Clarification of research design, research methods, and research

    Aguado AN (2009) Teaching research methods: Learning by doing. Journal of Public Affairs Education. 15(2): 251-260. Crossref. Google Scholar. ... Book Review: Research Design: Qualitative and Quantitative Approaches. Show details Hide details. R. Dale Wilson. Journal of Marketing Research. May 1996.

  5. Literature review as a research methodology: An ...

    An effective and well-conducted review as a research method creates a firm foundation for advancing knowledge and facilitating theory development (Webster & Watson, 2002). By integrating findings and perspectives from many empirical findings, a literature review can address research questions with a power that no single study has.

  6. Frontiers

    Therefore, this systematised review aimed to determine what research methods are being used, how these methods are being used and for what topics in the field. Our review of 999 articles from five journals over a period of 5 years indicated that psychology research is conducted in 10 topics via predominantly quantitative research methods.

  7. The Use of Research Methods in Psychological Research: A Systematised

    Therefore, this systematised review aimed to determine what research methods are being used, how these methods are being used and for what topics in the field. Our review of 999 articles from five journals over a period of 5 years indicated that psychology research is conducted in 10 topics via predominantly quantitative research methods.

  8. Journal of Mixed Methods Research: Sage Journals

    The scope includes delineating where mixed methods research may be used most effectively, illuminating design and procedure issues, and determining the logistics of conducting mixed methods research. This journal is a member of COPE. View full journal description

  9. Criteria for Good Qualitative Research: A Comprehensive Review

    This review aims to synthesize a published set of evaluative criteria for good qualitative research. The aim is to shed light on existing standards for assessing the rigor of qualitative research encompassing a range of epistemological and ontological standpoints. Using a systematic search strategy, published journal articles that deliberate criteria for rigorous research were identified. Then ...

  10. Home

    Behavior Research Methods is a dedicated outlet for the methodologies, techniques, and tools utilized in experimental psychology research. ... This journal has 747 open access articles. Journal updates. ... We acknowledge with gratitude the following Reviewers who contributed to the peer review process of Behavior Research Methods in 2023. We ...

  11. A Practical Guide to Writing Quantitative and Qualitative Research

    There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements.1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions ...

  12. Frontiers

    Open access publisher of peer-reviewed scientific articles across the entire spectrum of academia. Research network for academics to stay up-to-date with the latest scientific publications, events, blogs and news. ... Find a journal. We have a home for your research. Our community led journals cover more than 1,500 academic disciplines and are ...

  13. Research Methods: How to Perform an Effective Peer Review

    Peer review has been a part of scientific publications since 1665, when the Philosophical Transactions of the Royal Society became the first publication to formalize a system of expert review. 1,2 It became an institutionalized part of science in the latter half of the 20 th century and is now the standard in scientific research publications. 3 In 2012, there were more than 28 000 scholarly ...

  14. Research Methods: Peer-Reviewed Journal Articles

    Databases Containing Peer-Reviewed Journal Articles. Each database containing peer-reviewed journals has different content coverage and materials. The databases listed in this Research Guide are available only to Truckee Meadows Community College students, faculty and staff. You will need your TMCC credentials (Username and Password) to access ...

  15. Methodological Innovations: Sage Journals

    Methodological Innovations is an international, open access journal and the principal venue for publishing peer-reviewed, social-research methods articles. Methodological Innovations is the forum for methodological advances and debates in social research … | View full journal description. This journal is a member of the Committee on ...

  16. What Is Peer Review?

    The most common types are: Single-blind review. Double-blind review. Triple-blind review. Collaborative review. Open review. Relatedly, peer assessment is a process where your peers provide you with feedback on something you've written, based on a set of criteria or benchmarks from an instructor.

  17. Peer-reviewed Journals

    If your article is in a print journal, information on the journal's use of "peer review" is often found in the beginning pages of the journal, or in the back pages. Sometimes it may be referred to as a "blind review" or an "anonymous review" process. Remember, not all articles in peer-reviewed journals go through the peer-review process.

  18. Understanding and Evaluating Survey Research

    Survey research is defined as "the collection of information from a sample of individuals through their responses to questions" ( Check & Schutt, 2012, p. 160 ). This type of research allows for a variety of methods to recruit participants, collect data, and utilize various methods of instrumentation. Survey research can use quantitative ...

  19. A Systematic Literature Review and Conceptual Framework on Green ...

    Firstly, only peer-reviewed articles published in English were included. Secondly, the paper must have been published in a journal, as they are more reliable than other resources in the literature (Thai and Mai 2023). The number of journal articles found via database searching equaled 59 journal articles, with publication dates up to 1 January ...

  20. Cardiovascular health and cancer risk associated with plant based diets

    Context Cardiovascular diseases (CVDs) and cancer are the two main leading causes of death and disability worldwide. Suboptimal diet, poor in vegetables, fruits, legumes and whole grain, and rich in processed and red meat, refined grains, and added sugars, is a primary modifiable risk factor. Based on health, economic and ethical concerns, plant-based diets have progressively widespread worldwide.

  21. Promoting equality, diversity and inclusion in research and funding

    Equal, diverse, and inclusive teams lead to higher productivity, creativity, and greater problem-solving ability resulting in more impactful research. However, there is a gap between equality, diversity, and inclusion (EDI) research and practices to create an inclusive research culture. Research networks are vital to the research ecosystem, creating valuable opportunities for researchers to ...

  22. Contributions of Attachment Theory and Research: A Framework for Future

    One gets a glimpse of the germ of attachment theory in John Bowlby's 1944 article, "Forty-Four Juvenile Thieves: Their Character and Home-Life," published in the International Journal of Psychoanalysis.Using a combination of case studies and statistical methods (novel at the time for psychoanalysts) to examine the precursors of delinquency, Bowlby arrived at his initial empirical insight ...

  23. The great detectives: humans versus AI detectors in catching large

    The current study was approved by the Institutional Review Board of a university. This study consisted of four stages: (1) identifying 50 published peer-reviewed papers from four high-impact journals; (2) generating artificial papers using ChatGPT; (3) rephrasing the ChatGPT-generated papers using a paraphrasing tool called Wordtune; and (4) employing six AI content detectors to distinguish ...

  24. Organizational Research Methods: Sage Journals

    Organizational Research Methods (ORM), peer-reviewed and published quarterly, brings relevant methodological developments to a wide range of researchers in organizational and management studies and promotes a more effective understanding of current and new methodologies and their application in organizational settings.ORM is an elite scholarly journal, known for high-quality, from the ...

  25. Qualitative Research: Sage Journals

    Qualitative Research is a peer-reviewed international journal that has been leading debates about qualitative methods for over 20 years. The journal provides a forum for the discussion and development of qualitative methods across disciplines, publishing high quality articles that contribute to the ways in which we think about and practice the craft of qualitative research.

  26. Causal relationship of interleukin-6 and its receptor on sarcopenia

    Background Previous research has extensively examined the role of interleukin 6 (IL-6) in sarcopenia. However, the presence of a causal relationship between IL-6, its receptor (IL-6R), and sarcopenia remains unclear. Method In this study, we utilized summary-level data from genome-wide association studies (GWAS) focused on appendicular lean mass (ALM), hand grip strength, and walking pace ...

  27. Peer Review in Scientific Publications: Benefits, Critiques, & A

    Peer Review is defined as "a process of subjecting an author's scholarly work, research or ideas to the scrutiny of others who are experts in the same field" ( 1 ). Peer review is intended to serve two primary purposes. Firstly, it acts as a filter to ensure that only high quality research is published, especially in reputable journals ...

  28. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...