Grad Coach

How To Write A Research Paper

Step-By-Step Tutorial With Examples + FREE Template

By: Derek Jansen (MBA) | Expert Reviewer: Dr Eunice Rautenbach | March 2024

For many students, crafting a strong research paper from scratch can feel like a daunting task – and rightly so! In this post, we’ll unpack what a research paper is, what it needs to do , and how to write one – in three easy steps. 🙂 

Overview: Writing A Research Paper

What (exactly) is a research paper.

  • How to write a research paper
  • Stage 1 : Topic & literature search
  • Stage 2 : Structure & outline
  • Stage 3 : Iterative writing
  • Key takeaways

Let’s start by asking the most important question, “ What is a research paper? ”.

Simply put, a research paper is a scholarly written work where the writer (that’s you!) answers a specific question (this is called a research question ) through evidence-based arguments . Evidence-based is the keyword here. In other words, a research paper is different from an essay or other writing assignments that draw from the writer’s personal opinions or experiences. With a research paper, it’s all about building your arguments based on evidence (we’ll talk more about that evidence a little later).

Now, it’s worth noting that there are many different types of research papers , including analytical papers (the type I just described), argumentative papers, and interpretative papers. Here, we’ll focus on analytical papers , as these are some of the most common – but if you’re keen to learn about other types of research papers, be sure to check out the rest of the blog .

With that basic foundation laid, let’s get down to business and look at how to write a research paper .

Research Paper Template

Overview: The 3-Stage Process

While there are, of course, many potential approaches you can take to write a research paper, there are typically three stages to the writing process. So, in this tutorial, we’ll present a straightforward three-step process that we use when working with students at Grad Coach.

These three steps are:

  • Finding a research topic and reviewing the existing literature
  • Developing a provisional structure and outline for your paper, and
  • Writing up your initial draft and then refining it iteratively

Let’s dig into each of these.

Need a helping hand?

research paper writing assessment

Step 1: Find a topic and review the literature

As we mentioned earlier, in a research paper, you, as the researcher, will try to answer a question . More specifically, that’s called a research question , and it sets the direction of your entire paper. What’s important to understand though is that you’ll need to answer that research question with the help of high-quality sources – for example, journal articles, government reports, case studies, and so on. We’ll circle back to this in a minute.

The first stage of the research process is deciding on what your research question will be and then reviewing the existing literature (in other words, past studies and papers) to see what they say about that specific research question. In some cases, your professor may provide you with a predetermined research question (or set of questions). However, in many cases, you’ll need to find your own research question within a certain topic area.

Finding a strong research question hinges on identifying a meaningful research gap – in other words, an area that’s lacking in existing research. There’s a lot to unpack here, so if you wanna learn more, check out the plain-language explainer video below.

Once you’ve figured out which question (or questions) you’ll attempt to answer in your research paper, you’ll need to do a deep dive into the existing literature – this is called a “ literature search ”. Again, there are many ways to go about this, but your most likely starting point will be Google Scholar .

If you’re new to Google Scholar, think of it as Google for the academic world. You can start by simply entering a few different keywords that are relevant to your research question and it will then present a host of articles for you to review. What you want to pay close attention to here is the number of citations for each paper – the more citations a paper has, the more credible it is (generally speaking – there are some exceptions, of course).

how to use google scholar

Ideally, what you’re looking for are well-cited papers that are highly relevant to your topic. That said, keep in mind that citations are a cumulative metric , so older papers will often have more citations than newer papers – just because they’ve been around for longer. So, don’t fixate on this metric in isolation – relevance and recency are also very important.

Beyond Google Scholar, you’ll also definitely want to check out academic databases and aggregators such as Science Direct, PubMed, JStor and so on. These will often overlap with the results that you find in Google Scholar, but they can also reveal some hidden gems – so, be sure to check them out.

Once you’ve worked your way through all the literature, you’ll want to catalogue all this information in some sort of spreadsheet so that you can easily recall who said what, when and within what context. If you’d like, we’ve got a free literature spreadsheet that helps you do exactly that.

Don’t fixate on an article’s citation count in isolation - relevance (to your research question) and recency are also very important.

Step 2: Develop a structure and outline

With your research question pinned down and your literature digested and catalogued, it’s time to move on to planning your actual research paper .

It might sound obvious, but it’s really important to have some sort of rough outline in place before you start writing your paper. So often, we see students eagerly rushing into the writing phase, only to land up with a disjointed research paper that rambles on in multiple

Now, the secret here is to not get caught up in the fine details . Realistically, all you need at this stage is a bullet-point list that describes (in broad strokes) what you’ll discuss and in what order. It’s also useful to remember that you’re not glued to this outline – in all likelihood, you’ll chop and change some sections once you start writing, and that’s perfectly okay. What’s important is that you have some sort of roadmap in place from the start.

You need to have a rough outline in place before you start writing your paper - or you’ll end up with a disjointed research paper that rambles on.

At this stage you might be wondering, “ But how should I structure my research paper? ”. Well, there’s no one-size-fits-all solution here, but in general, a research paper will consist of a few relatively standardised components:

  • Introduction
  • Literature review
  • Methodology

Let’s take a look at each of these.

First up is the introduction section . As the name suggests, the purpose of the introduction is to set the scene for your research paper. There are usually (at least) four ingredients that go into this section – these are the background to the topic, the research problem and resultant research question , and the justification or rationale. If you’re interested, the video below unpacks the introduction section in more detail. 

The next section of your research paper will typically be your literature review . Remember all that literature you worked through earlier? Well, this is where you’ll present your interpretation of all that content . You’ll do this by writing about recent trends, developments, and arguments within the literature – but more specifically, those that are relevant to your research question . The literature review can oftentimes seem a little daunting, even to seasoned researchers, so be sure to check out our extensive collection of literature review content here .

With the introduction and lit review out of the way, the next section of your paper is the research methodology . In a nutshell, the methodology section should describe to your reader what you did (beyond just reviewing the existing literature) to answer your research question. For example, what data did you collect, how did you collect that data, how did you analyse that data and so on? For each choice, you’ll also need to justify why you chose to do it that way, and what the strengths and weaknesses of your approach were.

Now, it’s worth mentioning that for some research papers, this aspect of the project may be a lot simpler . For example, you may only need to draw on secondary sources (in other words, existing data sets). In some cases, you may just be asked to draw your conclusions from the literature search itself (in other words, there may be no data analysis at all). But, if you are required to collect and analyse data, you’ll need to pay a lot of attention to the methodology section. The video below provides an example of what the methodology section might look like.

By this stage of your paper, you will have explained what your research question is, what the existing literature has to say about that question, and how you analysed additional data to try to answer your question. So, the natural next step is to present your analysis of that data . This section is usually called the “results” or “analysis” section and this is where you’ll showcase your findings.

Depending on your school’s requirements, you may need to present and interpret the data in one section – or you might split the presentation and the interpretation into two sections. In the latter case, your “results” section will just describe the data, and the “discussion” is where you’ll interpret that data and explicitly link your analysis back to your research question. If you’re not sure which approach to take, check in with your professor or take a look at past papers to see what the norms are for your programme.

Alright – once you’ve presented and discussed your results, it’s time to wrap it up . This usually takes the form of the “ conclusion ” section. In the conclusion, you’ll need to highlight the key takeaways from your study and close the loop by explicitly answering your research question. Again, the exact requirements here will vary depending on your programme (and you may not even need a conclusion section at all) – so be sure to check with your professor if you’re unsure.

Step 3: Write and refine

Finally, it’s time to get writing. All too often though, students hit a brick wall right about here… So, how do you avoid this happening to you?

Well, there’s a lot to be said when it comes to writing a research paper (or any sort of academic piece), but we’ll share three practical tips to help you get started.

First and foremost , it’s essential to approach your writing as an iterative process. In other words, you need to start with a really messy first draft and then polish it over multiple rounds of editing. Don’t waste your time trying to write a perfect research paper in one go. Instead, take the pressure off yourself by adopting an iterative approach.

Secondly , it’s important to always lean towards critical writing , rather than descriptive writing. What does this mean? Well, at the simplest level, descriptive writing focuses on the “ what ”, while critical writing digs into the “ so what ” – in other words, the implications. If you’re not familiar with these two types of writing, don’t worry! You can find a plain-language explanation here.

Last but not least, you’ll need to get your referencing right. Specifically, you’ll need to provide credible, correctly formatted citations for the statements you make. We see students making referencing mistakes all the time and it costs them dearly. The good news is that you can easily avoid this by using a simple reference manager . If you don’t have one, check out our video about Mendeley, an easy (and free) reference management tool that you can start using today.

Recap: Key Takeaways

We’ve covered a lot of ground here. To recap, the three steps to writing a high-quality research paper are:

  • To choose a research question and review the literature
  • To plan your paper structure and draft an outline
  • To take an iterative approach to writing, focusing on critical writing and strong referencing

Remember, this is just a b ig-picture overview of the research paper development process and there’s a lot more nuance to unpack. So, be sure to grab a copy of our free research paper template to learn more about how to write a research paper.

You Might Also Like:

Referencing in Word

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Rubric Design

Main navigation, articulating your assessment values.

Reading, commenting on, and then assigning a grade to a piece of student writing requires intense attention and difficult judgment calls. Some faculty dread “the stack.” Students may share the faculty’s dim view of writing assessment, perceiving it as highly subjective. They wonder why one faculty member values evidence and correctness before all else, while another seeks a vaguely defined originality.

Writing rubrics can help address the concerns of both faculty and students by making writing assessment more efficient, consistent, and public. Whether it is called a grading rubric, a grading sheet, or a scoring guide, a writing assignment rubric lists criteria by which the writing is graded.

Why create a writing rubric?

  • It makes your tacit rhetorical knowledge explicit
  • It articulates community- and discipline-specific standards of excellence
  • It links the grade you give the assignment to the criteria
  • It can make your grading more efficient, consistent, and fair as you can read and comment with your criteria in mind
  • It can help you reverse engineer your course: once you have the rubrics created, you can align your readings, activities, and lectures with the rubrics to set your students up for success
  • It can help your students produce writing that you look forward to reading

How to create a writing rubric

Create a rubric at the same time you create the assignment. It will help you explain to the students what your goals are for the assignment.

  • Consider your purpose: do you need a rubric that addresses the standards for all the writing in the course? Or do you need to address the writing requirements and standards for just one assignment?  Task-specific rubrics are written to help teachers assess individual assignments or genres, whereas generic rubrics are written to help teachers assess multiple assignments.
  • Begin by listing the important qualities of the writing that will be produced in response to a particular assignment. It may be helpful to have several examples of excellent versions of the assignment in front of you: what writing elements do they all have in common? Among other things, these may include features of the argument, such as a main claim or thesis; use and presentation of sources, including visuals; and formatting guidelines such as the requirement of a works cited.
  • Then consider how the criteria will be weighted in grading. Perhaps all criteria are equally important, or perhaps there are two or three that all students must achieve to earn a passing grade. Decide what best fits the class and requirements of the assignment.

Consider involving students in Steps 2 and 3. A class session devoted to developing a rubric can provoke many important discussions about the ways the features of the language serve the purpose of the writing. And when students themselves work to describe the writing they are expected to produce, they are more likely to achieve it.

At this point, you will need to decide if you want to create a holistic or an analytic rubric. There is much debate about these two approaches to assessment.

Comparing Holistic and Analytic Rubrics

Holistic scoring .

Holistic scoring aims to rate overall proficiency in a given student writing sample. It is often used in large-scale writing program assessment and impromptu classroom writing for diagnostic purposes.

General tenets to holistic scoring:

  • Responding to drafts is part of evaluation
  • Responses do not focus on grammar and mechanics during drafting and there is little correction
  • Marginal comments are kept to 2-3 per page with summative comments at end
  • End commentary attends to students’ overall performance across learning objectives as articulated in the assignment
  • Response language aims to foster students’ self-assessment

Holistic rubrics emphasize what students do well and generally increase efficiency; they may also be more valid because scoring includes authentic, personal reaction of the reader. But holistic sores won’t tell a student how they’ve progressed relative to previous assignments and may be rater-dependent, reducing reliability. (For a summary of advantages and disadvantages of holistic scoring, see Becker, 2011, p. 116.)

Here is an example of a partial holistic rubric:

Summary meets all the criteria. The writer understands the article thoroughly. The main points in the article appear in the summary with all main points proportionately developed. The summary should be as comprehensive as possible and should be as comprehensive as possible and should read smoothly, with appropriate transitions between ideas. Sentences should be clear, without vagueness or ambiguity and without grammatical or mechanical errors.

A complete holistic rubric for a research paper (authored by Jonah Willihnganz) can be  downloaded here.

Analytic Scoring

Analytic scoring makes explicit the contribution to the final grade of each element of writing. For example, an instructor may choose to give 30 points for an essay whose ideas are sufficiently complex, that marshals good reasons in support of a thesis, and whose argument is logical; and 20 points for well-constructed sentences and careful copy editing.

General tenets to analytic scoring:

  • Reflect emphases in your teaching and communicate the learning goals for the course
  • Emphasize student performance across criterion, which are established as central to the assignment in advance, usually on an assignment sheet
  • Typically take a quantitative approach, providing a scaled set of points for each criterion
  • Make the analytic framework available to students before they write  

Advantages of an analytic rubric include ease of training raters and improved reliability. Meanwhile, writers often can more easily diagnose the strengths and weaknesses of their work. But analytic rubrics can be time-consuming to produce, and raters may judge the writing holistically anyway. Moreover, many readers believe that writing traits cannot be separated. (For a summary of the advantages and disadvantages of analytic scoring, see Becker, 2011, p. 115.)

For example, a partial analytic rubric for a single trait, “addresses a significant issue”:

  • Excellent: Elegantly establishes the current problem, why it matters, to whom
  • Above Average: Identifies the problem; explains why it matters and to whom
  • Competent: Describes topic but relevance unclear or cursory
  • Developing: Unclear issue and relevance

A  complete analytic rubric for a research paper can be downloaded here.  In WIM courses, this language should be revised to name specific disciplinary conventions.

Whichever type of rubric you write, your goal is to avoid pushing students into prescriptive formulas and limiting thinking (e.g., “each paragraph has five sentences”). By carefully describing the writing you want to read, you give students a clear target, and, as Ed White puts it, “describe the ongoing work of the class” (75).

Writing rubrics contribute meaningfully to the teaching of writing. Think of them as a coaching aide. In class and in conferences, you can use the language of the rubric to help you move past generic statements about what makes good writing good to statements about what constitutes success on the assignment and in the genre or discourse community. The rubric articulates what you are asking students to produce on the page; once that work is accomplished, you can turn your attention to explaining how students can achieve it.

Works Cited

Becker, Anthony.  “Examining Rubrics Used to Measure Writing Performance in U.S. Intensive English Programs.”   The CATESOL Journal  22.1 (2010/2011):113-30. Web.

White, Edward M.  Teaching and Assessing Writing . Proquest Info and Learning, 1985. Print.

Further Resources

CCCC Committee on Assessment. “Writing Assessment: A Position Statement.” November 2006 (Revised March 2009). Conference on College Composition and Communication. Web.

Gallagher, Chris W. “Assess Locally, Validate Globally: Heuristics for Validating Local Writing Assessments.” Writing Program Administration 34.1 (2010): 10-32. Web.

Huot, Brian.  (Re)Articulating Writing Assessment for Teaching and Learning.  Logan: Utah State UP, 2002. Print.

Kelly-Reilly, Diane, and Peggy O’Neil, eds. Journal of Writing Assessment. Web.

McKee, Heidi A., and Dànielle Nicole DeVoss DeVoss, Eds. Digital Writing Assessment & Evaluation. Logan, UT: Computers and Composition Digital Press/Utah State University Press, 2013. Web.

O’Neill, Peggy, Cindy Moore, and Brian Huot.  A Guide to College Writing Assessment . Logan: Utah State UP, 2009. Print.

Sommers, Nancy.  Responding to Student Writers . Macmillan Higher Education, 2013.

Straub, Richard. “Responding, Really Responding to Other Students’ Writing.” The Subject is Writing: Essays by Teachers and Students. Ed. Wendy Bishop. Boynton/Cook, 1999. Web.

White, Edward M., and Cassie A. Wright.  Assigning, Responding, Evaluating: A Writing Teacher’s Guide . 5th ed. Bedford/St. Martin’s, 2015. Print.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Research paper

Writing a Research Paper Introduction | Step-by-Step Guide

Published on September 24, 2022 by Jack Caulfield . Revised on March 27, 2023.

Writing a Research Paper Introduction

The introduction to a research paper is where you set up your topic and approach for the reader. It has several key goals:

  • Present your topic and get the reader interested
  • Provide background or summarize existing research
  • Position your own approach
  • Detail your specific research problem and problem statement
  • Give an overview of the paper’s structure

The introduction looks slightly different depending on whether your paper presents the results of original empirical research or constructs an argument by engaging with a variety of sources.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

Step 1: introduce your topic, step 2: describe the background, step 3: establish your research problem, step 4: specify your objective(s), step 5: map out your paper, research paper introduction examples, frequently asked questions about the research paper introduction.

The first job of the introduction is to tell the reader what your topic is and why it’s interesting or important. This is generally accomplished with a strong opening hook.

The hook is a striking opening sentence that clearly conveys the relevance of your topic. Think of an interesting fact or statistic, a strong statement, a question, or a brief anecdote that will get the reader wondering about your topic.

For example, the following could be an effective hook for an argumentative paper about the environmental impact of cattle farming:

A more empirical paper investigating the relationship of Instagram use with body image issues in adolescent girls might use the following hook:

Don’t feel that your hook necessarily has to be deeply impressive or creative. Clarity and relevance are still more important than catchiness. The key thing is to guide the reader into your topic and situate your ideas.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

This part of the introduction differs depending on what approach your paper is taking.

In a more argumentative paper, you’ll explore some general background here. In a more empirical paper, this is the place to review previous research and establish how yours fits in.

Argumentative paper: Background information

After you’ve caught your reader’s attention, specify a bit more, providing context and narrowing down your topic.

Provide only the most relevant background information. The introduction isn’t the place to get too in-depth; if more background is essential to your paper, it can appear in the body .

Empirical paper: Describing previous research

For a paper describing original research, you’ll instead provide an overview of the most relevant research that has already been conducted. This is a sort of miniature literature review —a sketch of the current state of research into your topic, boiled down to a few sentences.

This should be informed by genuine engagement with the literature. Your search can be less extensive than in a full literature review, but a clear sense of the relevant research is crucial to inform your own work.

Begin by establishing the kinds of research that have been done, and end with limitations or gaps in the research that you intend to respond to.

The next step is to clarify how your own research fits in and what problem it addresses.

Argumentative paper: Emphasize importance

In an argumentative research paper, you can simply state the problem you intend to discuss, and what is original or important about your argument.

Empirical paper: Relate to the literature

In an empirical research paper, try to lead into the problem on the basis of your discussion of the literature. Think in terms of these questions:

  • What research gap is your work intended to fill?
  • What limitations in previous work does it address?
  • What contribution to knowledge does it make?

You can make the connection between your problem and the existing research using phrases like the following.

Now you’ll get into the specifics of what you intend to find out or express in your research paper.

The way you frame your research objectives varies. An argumentative paper presents a thesis statement, while an empirical paper generally poses a research question (sometimes with a hypothesis as to the answer).

Argumentative paper: Thesis statement

The thesis statement expresses the position that the rest of the paper will present evidence and arguments for. It can be presented in one or two sentences, and should state your position clearly and directly, without providing specific arguments for it at this point.

Empirical paper: Research question and hypothesis

The research question is the question you want to answer in an empirical research paper.

Present your research question clearly and directly, with a minimum of discussion at this point. The rest of the paper will be taken up with discussing and investigating this question; here you just need to express it.

A research question can be framed either directly or indirectly.

  • This study set out to answer the following question: What effects does daily use of Instagram have on the prevalence of body image issues among adolescent girls?
  • We investigated the effects of daily Instagram use on the prevalence of body image issues among adolescent girls.

If your research involved testing hypotheses , these should be stated along with your research question. They are usually presented in the past tense, since the hypothesis will already have been tested by the time you are writing up your paper.

For example, the following hypothesis might respond to the research question above:

The final part of the introduction is often dedicated to a brief overview of the rest of the paper.

In a paper structured using the standard scientific “introduction, methods, results, discussion” format, this isn’t always necessary. But if your paper is structured in a less predictable way, it’s important to describe the shape of it for the reader.

If included, the overview should be concise, direct, and written in the present tense.

  • This paper will first discuss several examples of survey-based research into adolescent social media use, then will go on to …
  • This paper first discusses several examples of survey-based research into adolescent social media use, then goes on to …

Full examples of research paper introductions are shown in the tabs below: one for an argumentative paper, the other for an empirical paper.

  • Argumentative paper
  • Empirical paper

Are cows responsible for climate change? A recent study (RIVM, 2019) shows that cattle farmers account for two thirds of agricultural nitrogen emissions in the Netherlands. These emissions result from nitrogen in manure, which can degrade into ammonia and enter the atmosphere. The study’s calculations show that agriculture is the main source of nitrogen pollution, accounting for 46% of the country’s total emissions. By comparison, road traffic and households are responsible for 6.1% each, the industrial sector for 1%. While efforts are being made to mitigate these emissions, policymakers are reluctant to reckon with the scale of the problem. The approach presented here is a radical one, but commensurate with the issue. This paper argues that the Dutch government must stimulate and subsidize livestock farmers, especially cattle farmers, to transition to sustainable vegetable farming. It first establishes the inadequacy of current mitigation measures, then discusses the various advantages of the results proposed, and finally addresses potential objections to the plan on economic grounds.

The rise of social media has been accompanied by a sharp increase in the prevalence of body image issues among women and girls. This correlation has received significant academic attention: Various empirical studies have been conducted into Facebook usage among adolescent girls (Tiggermann & Slater, 2013; Meier & Gray, 2014). These studies have consistently found that the visual and interactive aspects of the platform have the greatest influence on body image issues. Despite this, highly visual social media (HVSM) such as Instagram have yet to be robustly researched. This paper sets out to address this research gap. We investigated the effects of daily Instagram use on the prevalence of body image issues among adolescent girls. It was hypothesized that daily Instagram use would be associated with an increase in body image concerns and a decrease in self-esteem ratings.

The introduction of a research paper includes several key elements:

  • A hook to catch the reader’s interest
  • Relevant background on the topic
  • Details of your research problem

and your problem statement

  • A thesis statement or research question
  • Sometimes an overview of the paper

Don’t feel that you have to write the introduction first. The introduction is often one of the last parts of the research paper you’ll write, along with the conclusion.

This is because it can be easier to introduce your paper once you’ve already written the body ; you may not have the clearest idea of your arguments until you’ve written them, and things can change during the writing process .

The way you present your research problem in your introduction varies depending on the nature of your research paper . A research paper that presents a sustained argument will usually encapsulate this argument in a thesis statement .

A research paper designed to present the results of empirical research tends to present a research question that it seeks to answer. It may also include a hypothesis —a prediction that will be confirmed or disproved by your research.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, March 27). Writing a Research Paper Introduction | Step-by-Step Guide. Scribbr. Retrieved April 1, 2024, from https://www.scribbr.com/research-paper/research-paper-introduction/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, writing strong research questions | criteria & examples, writing a research paper conclusion | step-by-step guide, research paper format | apa, mla, & chicago templates, what is your plagiarism score.

Book cover

Research Questions in Language Education and Applied Linguistics pp 431–435 Cite as

Writing Assessment Literacy

  • Deborah Crusan 3  
  • First Online: 13 January 2022

197 Accesses

Part of the book series: Springer Texts in Education ((SPTE))

Although classroom writing assessment is a significant responsibility for writing teachers, many instructors lack an understanding of sound and effective assessment practices in the writing classroom aka Writing Assessment Literacy (WAL).

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Crusan, D. (2010). Assessment in the second language writing classroom . University of Michigan Press.

Book   Google Scholar  

Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second language teachers’ knowledge, beliefs, and practices. Assessing Writing, 28 , 43–56. https://doi.org/10.1016/j.asw.2016.03.001

Article   Google Scholar  

Lee, I. (2017). Classroom writing assessment and feedback in L2 school contexts . Springer.

Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16 , 194–209. https://doi.org/10.1016/j.jslw.2007.07.004

Download references

Author information

Authors and affiliations.

Department of English Language and Literatures, Wright State University, Dayton, OH, USA

Deborah Crusan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Deborah Crusan .

Editor information

Editors and affiliations.

European Knowledge Development Institute, Ankara, Türkiye

Hassan Mohebbi

Higher Colleges of Technology (HCT), Dubai Men’s College, Dubai, United Arab Emirates

Christine Coombe

The Research Questions

In what ways have second language writing teachers obtained assessment knowledge?

What are the common beliefs held by second language writing teachers about writing assessment?

What are the assessment practices of second language writing teachers?

What is the impact of linguistic background on writing assessment knowledge, beliefs, and practices?

What is the impact of teaching experience on writing assessment knowledge, beliefs, and practices?

What skills are necessary for teachers to claim that they are literate in writing assessment?

How can teaching be enhanced through writing assessment literacy?

In what ways might teachers’ work be shaped by testing policies and practices?

How does context affect teacher practices in writing assessment?

Should teachers be required to provide evidence of writing assessment literacy? If so, how?

Suggested Resources

Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second language teachers’ knowledge, beliefs, and practices. Assessing Writing , 28 , 43–56. https://doi.org/10.1016/j.asw.2016.03.001 .

Claiming a teacher knowledge gap in all aspects of writing assessment, the authors explore ways in which writing teachers have obtained writing assessment literacy. Asserting that teachers often feel un(der)prepared for the assessment tasks they face in the writing classroom, the researchers surveyed 702 writing teachers from around the globe to establish evidence for this claim; the researchers found that although teachers professed education in assessment in general and writing assessment in particular, these same teachers worried that they are less than sure of their abilities in rubric creation, written corrective feedback, and justification of grades, crucial elements in the assessment of writing. The study also uncovered interesting notions about linguistic background: NNESTs reported higher levels of writing assessment literacy.

Fernando, W. (2018) . Show me your true colours: Scaffolding formative academic literacy assessment through an online learning platform. Assessing Writing, 36 , 63–76. https://doi.org/10.1016/j.asw.2018.03.005

In this paper, the author focuses on student writing processes and examines ways those processes are affected by students’ formative academic literacy assessment. Does formative academic literacy promote more engagement with composing processes and if so, what evidence supports this proposition? To investigate, the author asked students to use an online learning platform to generate data in the form of outlines/essays with feedback, student-generated digital artefacts, and questionnaires/follow-up interviews to answer two important questions: “(1) How can formative academic literacy assessment help students engage in composing processes and improve their writing? (2) How can online technology be used to facilitate and formatively assess student engagement with composing processes?” (p. 65). Her findings offer evidence for and indicate that students benefit from understanding their composition processes; additionally, this understanding is furthered by scaffolding formative academic literacy assessment through an online platform that uncovers and overcomes students’ difficulties as they learn to write.

Inbar-Lourie, O. (2017). Language Assessment Literacy. In: E. Shohamy, I. Or, & S. May (Eds.), Language Testing and Assessment: Encyclopedia of Language and Education (3rd ed.). Cham, Switzerland: Springer.

While this encyclopedia entry does not specifically address writing assessment literacy, it provides an excellent definition of language assessment literacy (LAL) and broadly informs the field of this definition and argues for the need for teachers to be assessment literate. Calling for the need to define a literacy framework in language assessment and citing as a matter still in need of resolution the divide between views of formative and dynamic assessment. Operationalization of a theoretical framework remains important, but contextualized definitions might be the more judicious way to approach this issue, since many different stakeholders (e.g. classroom teachers, students, parents, test developers) are involved.

Lam, R. (2015). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32 (2), 169–197. https://doi.org/10.1177/0265532214554321

Although this article is not specifically about writing assessment literacy, its author makes the same arguments for training of teachers as those who argue for writing assessment literacy for teachers. These arguments bring home the lack of teacher education in assessment in general and writing assessment in particular. Lam surveys various documents, conducts interviews, examines student assessment tasks, and five institutions in Hong Kong, targeting pre-service teachers being trained for the primary and secondary school settings. Lam uncovered five themes running through the data from which he distills three key issues: (1) local teacher education program support for further language assessment training, (2) taking care that the definition of LAL is understood from an ethical perspective, and (3) that those who administer the programs in Lam’s study collaborate to assure that pre-service teachers meet compulsory standards for language assessment literacy.

Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58 , 149–162.

The authors synthesized 100 studies concerning teacher assessment literacy (TAL) to determine what has and has not worked in the advancement of TAL. Following this and based on their findings of their comprehensive literature review, the authors developed a conceptual framework of Teacher Assessment Literacy in Practice (TALiP), which calls for a crucial knowledge base for all teachers, but which the authors call necessary but not sufficient. They then go on to categorize other aspects that need to considered before a teacher can be fully assessment literate, calling the symbiosis between and among components reciprocal. Along with the knowledge base, elements include the ways in which teachers intellectualize assessment, contexts involved especially institutional and socio-cultural, TALiP (the framework’s primary notion), teacher learning, and ways (or if) in which teachers view themselves as competent assessors. The framework has implications for multiple platforms: pre-service teacher education, in-service teacher education, and teacher training challenges encountered when attempting expansion of TAL.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Cite this chapter.

Crusan, D. (2021). Writing Assessment Literacy. In: Mohebbi, H., Coombe, C. (eds) Research Questions in Language Education and Applied Linguistics. Springer Texts in Education. Springer, Cham. https://doi.org/10.1007/978-3-030-79143-8_77

Download citation

DOI : https://doi.org/10.1007/978-3-030-79143-8_77

Published : 13 January 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-79142-1

Online ISBN : 978-3-030-79143-8

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • U.S. Locations
  • UMGC Europe
  • Learn Online
  • Find Answers
  • 855-655-8682
  • Current Students

Online Guide to Writing and Research

Assessing your writing, explore more of umgc.

  • Online Guide to Writing

Using Assessment to Improve Your Writing

TIPS wood word on compressed or corkboard with human's finger at S letter.

When you assess your writing, you are committing to improve it. Your commitment to improve will certainly include making adjustments to papers and correcting errors. However, assessing your writing can also motivate you to practice writing. As with anything, practicing will help your writing improve. Below are a series of habits that can help as you practice your writing skills.

Learn from your mistakes.

Watch for patterns of strengths and weaknesses in your writing. Learn how to build on your strengths and address your weaknesses. Then, consciously and methodically work on improving them.

Analyze examples of good writing.

By understanding how other writers have succeeded in writing effectively, you can improve your skills and strategies. Keep a notebook of writing samples that includes your notations about what works and what does not.

Build your skills as you go along.

Work methodically on your writing, focusing on specific skills each time you write a paper. Keep a journal or course notebook so you can practice writing in a specific discipline. Do not wait until the last minute to work on your writing assignments.

Take advantage of informal writing assignments.

Some courses require you to keep a journal or notebook with questions, case-study discussions, or laboratory notes. Use these opportunities to practice writing topic sentences, thesis statements, and major and minor supports. Practice writing effective sentences in a unified paragraph, or practice freewriting. Use these shorter assignments as warm‑ups for the longer research papers.

Look for extra writing opportunities in all your courses.

When you participate in a group assignment, volunteer to take minutes or write summaries of group discussions. Produce written proceedings of oral presentations. Ask your instructor for extra credit for writing extra papers.

Take writing courses beyond your requirements.

All writing courses, including creative writing, will help you improve your writing skills. Take a variety of writing courses to help you broaden your vocabulary, style, sentence structure, organization, and critical thinking skills.

Take writing seminars offered by professional organizations.

Many professional organizations offer short writing courses to help you learn new skills or new kinds of writing. Other organizations offer Internet-based courses to teach new writing skills, review grammar, and build vocabulary.

Keep a journal or writing log.

Keep tabs of your writing plan and your improvement. Set aside 10 or 15 minutes daily to review what you are focusing on and practicing; set aside another 10 or 15 minutes to practice it. Set goals to learn a new writing skill each week or month.

Read, read, read.

By reading good writing, you will be able to identify solid writing models and improve your sentence structure, spelling, punctuation, and vocabulary.

Write, write, write.

Reading will help you write better. However, you should also practice what you have learned. Take every opportunity to practice effective techniques you have noticed while reading.

Mailing Address: 3501 University Blvd. East, Adelphi, MD 20783 This work is licensed under a  Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License . © 2022 UMGC. All links to external sites were verified at the time of publication. UMGC is not responsible for the validity or integrity of information located at external sites.

Table of Contents: Online Guide to Writing

Chapter 1: College Writing

How Does College Writing Differ from Workplace Writing?

What Is College Writing?

Why So Much Emphasis on Writing?

Chapter 2: The Writing Process

Doing Exploratory Research

Getting from Notes to Your Draft

Introduction

Prewriting - Techniques to Get Started - Mining Your Intuition

Prewriting: Targeting Your Audience

Prewriting: Techniques to Get Started

Prewriting: Understanding Your Assignment

Rewriting: Being Your Own Critic

Rewriting: Creating a Revision Strategy

Rewriting: Getting Feedback

Rewriting: The Final Draft

Techniques to Get Started - Outlining

Techniques to Get Started - Using Systematic Techniques

Thesis Statement and Controlling Idea

Writing: Getting from Notes to Your Draft - Freewriting

Writing: Getting from Notes to Your Draft - Summarizing Your Ideas

Writing: Outlining What You Will Write

Chapter 3: Thinking Strategies

A Word About Style, Voice, and Tone

A Word About Style, Voice, and Tone: Style Through Vocabulary and Diction

Critical Strategies and Writing

Critical Strategies and Writing: Analysis

Critical Strategies and Writing: Evaluation

Critical Strategies and Writing: Persuasion

Critical Strategies and Writing: Synthesis

Developing a Paper Using Strategies

Kinds of Assignments You Will Write

Patterns for Presenting Information

Patterns for Presenting Information: Critiques

Patterns for Presenting Information: Discussing Raw Data

Patterns for Presenting Information: General-to-Specific Pattern

Patterns for Presenting Information: Problem-Cause-Solution Pattern

Patterns for Presenting Information: Specific-to-General Pattern

Patterns for Presenting Information: Summaries and Abstracts

Supporting with Research and Examples

Writing Essay Examinations

Writing Essay Examinations: Make Your Answer Relevant and Complete

Writing Essay Examinations: Organize Thinking Before Writing

Writing Essay Examinations: Read and Understand the Question

Chapter 4: The Research Process

Planning and Writing a Research Paper

Planning and Writing a Research Paper: Ask a Research Question

Planning and Writing a Research Paper: Cite Sources

Planning and Writing a Research Paper: Collect Evidence

Planning and Writing a Research Paper: Decide Your Point of View, or Role, for Your Research

Planning and Writing a Research Paper: Draw Conclusions

Planning and Writing a Research Paper: Find a Topic and Get an Overview

Planning and Writing a Research Paper: Manage Your Resources

Planning and Writing a Research Paper: Outline

Planning and Writing a Research Paper: Survey the Literature

Planning and Writing a Research Paper: Work Your Sources into Your Research Writing

Research Resources: Where Are Research Resources Found? - Human Resources

Research Resources: What Are Research Resources?

Research Resources: Where Are Research Resources Found?

Research Resources: Where Are Research Resources Found? - Electronic Resources

Research Resources: Where Are Research Resources Found? - Print Resources

Structuring the Research Paper: Formal Research Structure

Structuring the Research Paper: Informal Research Structure

The Nature of Research

The Research Assignment: How Should Research Sources Be Evaluated?

The Research Assignment: When Is Research Needed?

The Research Assignment: Why Perform Research?

Chapter 5: Academic Integrity

Academic Integrity

Giving Credit to Sources

Giving Credit to Sources: Copyright Laws

Giving Credit to Sources: Documentation

Giving Credit to Sources: Style Guides

Integrating Sources

Practicing Academic Integrity

Practicing Academic Integrity: Keeping Accurate Records

Practicing Academic Integrity: Managing Source Material

Practicing Academic Integrity: Managing Source Material - Paraphrasing Your Source

Practicing Academic Integrity: Managing Source Material - Quoting Your Source

Practicing Academic Integrity: Managing Source Material - Summarizing Your Sources

Types of Documentation

Types of Documentation: Bibliographies and Source Lists

Types of Documentation: Citing World Wide Web Sources

Types of Documentation: In-Text or Parenthetical Citations

Types of Documentation: In-Text or Parenthetical Citations - APA Style

Types of Documentation: In-Text or Parenthetical Citations - CSE/CBE Style

Types of Documentation: In-Text or Parenthetical Citations - Chicago Style

Types of Documentation: In-Text or Parenthetical Citations - MLA Style

Types of Documentation: Note Citations

Chapter 6: Using Library Resources

Finding Library Resources

Chapter 7: Assessing Your Writing

How Is Writing Graded?

How Is Writing Graded?: A General Assessment Tool

The Draft Stage

The Draft Stage: The First Draft

The Draft Stage: The Revision Process and the Final Draft

The Draft Stage: Using Feedback

The Research Stage

Chapter 8: Other Frequently Assigned Papers

Reviews and Reaction Papers: Article and Book Reviews

Reviews and Reaction Papers: Reaction Papers

Writing Arguments

Writing Arguments: Adapting the Argument Structure

Writing Arguments: Purposes of Argument

Writing Arguments: References to Consult for Writing Arguments

Writing Arguments: Steps to Writing an Argument - Anticipate Active Opposition

Writing Arguments: Steps to Writing an Argument - Determine Your Organization

Writing Arguments: Steps to Writing an Argument - Develop Your Argument

Writing Arguments: Steps to Writing an Argument - Introduce Your Argument

Writing Arguments: Steps to Writing an Argument - State Your Thesis or Proposition

Writing Arguments: Steps to Writing an Argument - Write Your Conclusion

Writing Arguments: Types of Argument

Appendix A: Books to Help Improve Your Writing

Dictionaries

General Style Manuals

Researching on the Internet

Special Style Manuals

Writing Handbooks

Appendix B: Collaborative Writing and Peer Reviewing

Collaborative Writing: Assignments to Accompany the Group Project

Collaborative Writing: Informal Progress Report

Collaborative Writing: Issues to Resolve

Collaborative Writing: Methodology

Collaborative Writing: Peer Evaluation

Collaborative Writing: Tasks of Collaborative Writing Group Members

Collaborative Writing: Writing Plan

General Introduction

Peer Reviewing

Appendix C: Developing an Improvement Plan

Working with Your Instructor’s Comments and Grades

Appendix D: Writing Plan and Project Schedule

Devising a Writing Project Plan and Schedule

Reviewing Your Plan with Others

By using our website you agree to our use of cookies. Learn more about how we use cookies by reading our  Privacy Policy .

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Research Paper – Structure, Examples and Writing Guide

Research Paper – Structure, Examples and Writing Guide

Table of Contents

Research Paper

Research Paper

Definition:

Research Paper is a written document that presents the author’s original research, analysis, and interpretation of a specific topic or issue.

It is typically based on Empirical Evidence, and may involve qualitative or quantitative research methods, or a combination of both. The purpose of a research paper is to contribute new knowledge or insights to a particular field of study, and to demonstrate the author’s understanding of the existing literature and theories related to the topic.

Structure of Research Paper

The structure of a research paper typically follows a standard format, consisting of several sections that convey specific information about the research study. The following is a detailed explanation of the structure of a research paper:

The title page contains the title of the paper, the name(s) of the author(s), and the affiliation(s) of the author(s). It also includes the date of submission and possibly, the name of the journal or conference where the paper is to be published.

The abstract is a brief summary of the research paper, typically ranging from 100 to 250 words. It should include the research question, the methods used, the key findings, and the implications of the results. The abstract should be written in a concise and clear manner to allow readers to quickly grasp the essence of the research.

Introduction

The introduction section of a research paper provides background information about the research problem, the research question, and the research objectives. It also outlines the significance of the research, the research gap that it aims to fill, and the approach taken to address the research question. Finally, the introduction section ends with a clear statement of the research hypothesis or research question.

Literature Review

The literature review section of a research paper provides an overview of the existing literature on the topic of study. It includes a critical analysis and synthesis of the literature, highlighting the key concepts, themes, and debates. The literature review should also demonstrate the research gap and how the current study seeks to address it.

The methods section of a research paper describes the research design, the sample selection, the data collection and analysis procedures, and the statistical methods used to analyze the data. This section should provide sufficient detail for other researchers to replicate the study.

The results section presents the findings of the research, using tables, graphs, and figures to illustrate the data. The findings should be presented in a clear and concise manner, with reference to the research question and hypothesis.

The discussion section of a research paper interprets the findings and discusses their implications for the research question, the literature review, and the field of study. It should also address the limitations of the study and suggest future research directions.

The conclusion section summarizes the main findings of the study, restates the research question and hypothesis, and provides a final reflection on the significance of the research.

The references section provides a list of all the sources cited in the paper, following a specific citation style such as APA, MLA or Chicago.

How to Write Research Paper

You can write Research Paper by the following guide:

  • Choose a Topic: The first step is to select a topic that interests you and is relevant to your field of study. Brainstorm ideas and narrow down to a research question that is specific and researchable.
  • Conduct a Literature Review: The literature review helps you identify the gap in the existing research and provides a basis for your research question. It also helps you to develop a theoretical framework and research hypothesis.
  • Develop a Thesis Statement : The thesis statement is the main argument of your research paper. It should be clear, concise and specific to your research question.
  • Plan your Research: Develop a research plan that outlines the methods, data sources, and data analysis procedures. This will help you to collect and analyze data effectively.
  • Collect and Analyze Data: Collect data using various methods such as surveys, interviews, observations, or experiments. Analyze data using statistical tools or other qualitative methods.
  • Organize your Paper : Organize your paper into sections such as Introduction, Literature Review, Methods, Results, Discussion, and Conclusion. Ensure that each section is coherent and follows a logical flow.
  • Write your Paper : Start by writing the introduction, followed by the literature review, methods, results, discussion, and conclusion. Ensure that your writing is clear, concise, and follows the required formatting and citation styles.
  • Edit and Proofread your Paper: Review your paper for grammar and spelling errors, and ensure that it is well-structured and easy to read. Ask someone else to review your paper to get feedback and suggestions for improvement.
  • Cite your Sources: Ensure that you properly cite all sources used in your research paper. This is essential for giving credit to the original authors and avoiding plagiarism.

Research Paper Example

Note : The below example research paper is for illustrative purposes only and is not an actual research paper. Actual research papers may have different structures, contents, and formats depending on the field of study, research question, data collection and analysis methods, and other factors. Students should always consult with their professors or supervisors for specific guidelines and expectations for their research papers.

Research Paper Example sample for Students:

Title: The Impact of Social Media on Mental Health among Young Adults

Abstract: This study aims to investigate the impact of social media use on the mental health of young adults. A literature review was conducted to examine the existing research on the topic. A survey was then administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO (Fear of Missing Out) are significant predictors of mental health problems among young adults.

Introduction: Social media has become an integral part of modern life, particularly among young adults. While social media has many benefits, including increased communication and social connectivity, it has also been associated with negative outcomes, such as addiction, cyberbullying, and mental health problems. This study aims to investigate the impact of social media use on the mental health of young adults.

Literature Review: The literature review highlights the existing research on the impact of social media use on mental health. The review shows that social media use is associated with depression, anxiety, stress, and other mental health problems. The review also identifies the factors that contribute to the negative impact of social media, including social comparison, cyberbullying, and FOMO.

Methods : A survey was administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The survey included questions on social media use, mental health status (measured using the DASS-21), and perceived impact of social media on their mental health. Data were analyzed using descriptive statistics and regression analysis.

Results : The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO are significant predictors of mental health problems among young adults.

Discussion : The study’s findings suggest that social media use has a negative impact on the mental health of young adults. The study highlights the need for interventions that address the factors contributing to the negative impact of social media, such as social comparison, cyberbullying, and FOMO.

Conclusion : In conclusion, social media use has a significant impact on the mental health of young adults. The study’s findings underscore the need for interventions that promote healthy social media use and address the negative outcomes associated with social media use. Future research can explore the effectiveness of interventions aimed at reducing the negative impact of social media on mental health. Additionally, longitudinal studies can investigate the long-term effects of social media use on mental health.

Limitations : The study has some limitations, including the use of self-report measures and a cross-sectional design. The use of self-report measures may result in biased responses, and a cross-sectional design limits the ability to establish causality.

Implications: The study’s findings have implications for mental health professionals, educators, and policymakers. Mental health professionals can use the findings to develop interventions that address the negative impact of social media use on mental health. Educators can incorporate social media literacy into their curriculum to promote healthy social media use among young adults. Policymakers can use the findings to develop policies that protect young adults from the negative outcomes associated with social media use.

References :

  • Twenge, J. M., & Campbell, W. K. (2019). Associations between screen time and lower psychological well-being among children and adolescents: Evidence from a population-based study. Preventive medicine reports, 15, 100918.
  • Primack, B. A., Shensa, A., Escobar-Viera, C. G., Barrett, E. L., Sidani, J. E., Colditz, J. B., … & James, A. E. (2017). Use of multiple social media platforms and symptoms of depression and anxiety: A nationally-representative study among US young adults. Computers in Human Behavior, 69, 1-9.
  • Van der Meer, T. G., & Verhoeven, J. W. (2017). Social media and its impact on academic performance of students. Journal of Information Technology Education: Research, 16, 383-398.

Appendix : The survey used in this study is provided below.

Social Media and Mental Health Survey

  • How often do you use social media per day?
  • Less than 30 minutes
  • 30 minutes to 1 hour
  • 1 to 2 hours
  • 2 to 4 hours
  • More than 4 hours
  • Which social media platforms do you use?
  • Others (Please specify)
  • How often do you experience the following on social media?
  • Social comparison (comparing yourself to others)
  • Cyberbullying
  • Fear of Missing Out (FOMO)
  • Have you ever experienced any of the following mental health problems in the past month?
  • Do you think social media use has a positive or negative impact on your mental health?
  • Very positive
  • Somewhat positive
  • Somewhat negative
  • Very negative
  • In your opinion, which factors contribute to the negative impact of social media on mental health?
  • Social comparison
  • In your opinion, what interventions could be effective in reducing the negative impact of social media on mental health?
  • Education on healthy social media use
  • Counseling for mental health problems caused by social media
  • Social media detox programs
  • Regulation of social media use

Thank you for your participation!

Applications of Research Paper

Research papers have several applications in various fields, including:

  • Advancing knowledge: Research papers contribute to the advancement of knowledge by generating new insights, theories, and findings that can inform future research and practice. They help to answer important questions, clarify existing knowledge, and identify areas that require further investigation.
  • Informing policy: Research papers can inform policy decisions by providing evidence-based recommendations for policymakers. They can help to identify gaps in current policies, evaluate the effectiveness of interventions, and inform the development of new policies and regulations.
  • Improving practice: Research papers can improve practice by providing evidence-based guidance for professionals in various fields, including medicine, education, business, and psychology. They can inform the development of best practices, guidelines, and standards of care that can improve outcomes for individuals and organizations.
  • Educating students : Research papers are often used as teaching tools in universities and colleges to educate students about research methods, data analysis, and academic writing. They help students to develop critical thinking skills, research skills, and communication skills that are essential for success in many careers.
  • Fostering collaboration: Research papers can foster collaboration among researchers, practitioners, and policymakers by providing a platform for sharing knowledge and ideas. They can facilitate interdisciplinary collaborations and partnerships that can lead to innovative solutions to complex problems.

When to Write Research Paper

Research papers are typically written when a person has completed a research project or when they have conducted a study and have obtained data or findings that they want to share with the academic or professional community. Research papers are usually written in academic settings, such as universities, but they can also be written in professional settings, such as research organizations, government agencies, or private companies.

Here are some common situations where a person might need to write a research paper:

  • For academic purposes: Students in universities and colleges are often required to write research papers as part of their coursework, particularly in the social sciences, natural sciences, and humanities. Writing research papers helps students to develop research skills, critical thinking skills, and academic writing skills.
  • For publication: Researchers often write research papers to publish their findings in academic journals or to present their work at academic conferences. Publishing research papers is an important way to disseminate research findings to the academic community and to establish oneself as an expert in a particular field.
  • To inform policy or practice : Researchers may write research papers to inform policy decisions or to improve practice in various fields. Research findings can be used to inform the development of policies, guidelines, and best practices that can improve outcomes for individuals and organizations.
  • To share new insights or ideas: Researchers may write research papers to share new insights or ideas with the academic or professional community. They may present new theories, propose new research methods, or challenge existing paradigms in their field.

Purpose of Research Paper

The purpose of a research paper is to present the results of a study or investigation in a clear, concise, and structured manner. Research papers are written to communicate new knowledge, ideas, or findings to a specific audience, such as researchers, scholars, practitioners, or policymakers. The primary purposes of a research paper are:

  • To contribute to the body of knowledge : Research papers aim to add new knowledge or insights to a particular field or discipline. They do this by reporting the results of empirical studies, reviewing and synthesizing existing literature, proposing new theories, or providing new perspectives on a topic.
  • To inform or persuade: Research papers are written to inform or persuade the reader about a particular issue, topic, or phenomenon. They present evidence and arguments to support their claims and seek to persuade the reader of the validity of their findings or recommendations.
  • To advance the field: Research papers seek to advance the field or discipline by identifying gaps in knowledge, proposing new research questions or approaches, or challenging existing assumptions or paradigms. They aim to contribute to ongoing debates and discussions within a field and to stimulate further research and inquiry.
  • To demonstrate research skills: Research papers demonstrate the author’s research skills, including their ability to design and conduct a study, collect and analyze data, and interpret and communicate findings. They also demonstrate the author’s ability to critically evaluate existing literature, synthesize information from multiple sources, and write in a clear and structured manner.

Characteristics of Research Paper

Research papers have several characteristics that distinguish them from other forms of academic or professional writing. Here are some common characteristics of research papers:

  • Evidence-based: Research papers are based on empirical evidence, which is collected through rigorous research methods such as experiments, surveys, observations, or interviews. They rely on objective data and facts to support their claims and conclusions.
  • Structured and organized: Research papers have a clear and logical structure, with sections such as introduction, literature review, methods, results, discussion, and conclusion. They are organized in a way that helps the reader to follow the argument and understand the findings.
  • Formal and objective: Research papers are written in a formal and objective tone, with an emphasis on clarity, precision, and accuracy. They avoid subjective language or personal opinions and instead rely on objective data and analysis to support their arguments.
  • Citations and references: Research papers include citations and references to acknowledge the sources of information and ideas used in the paper. They use a specific citation style, such as APA, MLA, or Chicago, to ensure consistency and accuracy.
  • Peer-reviewed: Research papers are often peer-reviewed, which means they are evaluated by other experts in the field before they are published. Peer-review ensures that the research is of high quality, meets ethical standards, and contributes to the advancement of knowledge in the field.
  • Objective and unbiased: Research papers strive to be objective and unbiased in their presentation of the findings. They avoid personal biases or preconceptions and instead rely on the data and analysis to draw conclusions.

Advantages of Research Paper

Research papers have many advantages, both for the individual researcher and for the broader academic and professional community. Here are some advantages of research papers:

  • Contribution to knowledge: Research papers contribute to the body of knowledge in a particular field or discipline. They add new information, insights, and perspectives to existing literature and help advance the understanding of a particular phenomenon or issue.
  • Opportunity for intellectual growth: Research papers provide an opportunity for intellectual growth for the researcher. They require critical thinking, problem-solving, and creativity, which can help develop the researcher’s skills and knowledge.
  • Career advancement: Research papers can help advance the researcher’s career by demonstrating their expertise and contributions to the field. They can also lead to new research opportunities, collaborations, and funding.
  • Academic recognition: Research papers can lead to academic recognition in the form of awards, grants, or invitations to speak at conferences or events. They can also contribute to the researcher’s reputation and standing in the field.
  • Impact on policy and practice: Research papers can have a significant impact on policy and practice. They can inform policy decisions, guide practice, and lead to changes in laws, regulations, or procedures.
  • Advancement of society: Research papers can contribute to the advancement of society by addressing important issues, identifying solutions to problems, and promoting social justice and equality.

Limitations of Research Paper

Research papers also have some limitations that should be considered when interpreting their findings or implications. Here are some common limitations of research papers:

  • Limited generalizability: Research findings may not be generalizable to other populations, settings, or contexts. Studies often use specific samples or conditions that may not reflect the broader population or real-world situations.
  • Potential for bias : Research papers may be biased due to factors such as sample selection, measurement errors, or researcher biases. It is important to evaluate the quality of the research design and methods used to ensure that the findings are valid and reliable.
  • Ethical concerns: Research papers may raise ethical concerns, such as the use of vulnerable populations or invasive procedures. Researchers must adhere to ethical guidelines and obtain informed consent from participants to ensure that the research is conducted in a responsible and respectful manner.
  • Limitations of methodology: Research papers may be limited by the methodology used to collect and analyze data. For example, certain research methods may not capture the complexity or nuance of a particular phenomenon, or may not be appropriate for certain research questions.
  • Publication bias: Research papers may be subject to publication bias, where positive or significant findings are more likely to be published than negative or non-significant findings. This can skew the overall findings of a particular area of research.
  • Time and resource constraints: Research papers may be limited by time and resource constraints, which can affect the quality and scope of the research. Researchers may not have access to certain data or resources, or may be unable to conduct long-term studies due to practical limitations.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Paper Citation

How to Cite Research Paper – All Formats and...

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Paper Formats

Research Paper Format – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Scaffolding Methods for Research Paper Writing

Scaffolding Methods for Research Paper Writing

  • Resources & Preparation
  • Instructional Plan
  • Related Resources

Students will use scaffolding to research and organize information for writing a research paper. A research paper scaffold provides students with clear support for writing expository papers that include a question (problem), literature review, analysis, methodology for original research, results, conclusion, and references. Students examine informational text, use an inquiry-based approach, and practice genre-specific strategies for expository writing. Depending on the goals of the assignment, students may work collaboratively or as individuals. A student-written paper about color psychology provides an authentic model of a scaffold and the corresponding finished paper. The research paper scaffold is designed to be completed during seven or eight sessions over the course of four to six weeks.

Featured Resources

  • Research Paper Scaffold : This handout guides students in researching and organizing the information they need for writing their research paper.
  • Inquiry on the Internet: Evaluating Web Pages for a Class Collection : Students use Internet search engines and Web analysis checklists to evaluate online resources then write annotations that explain how and why the resources will be valuable to the class.

From Theory to Practice

  • Research paper scaffolding provides a temporary linguistic tool to assist students as they organize their expository writing. Scaffolding assists students in moving to levels of language performance they might be unable to obtain without this support.
  • An instructional scaffold essentially changes the role of the teacher from that of giver of knowledge to leader in inquiry. This relationship encourages creative intelligence on the part of both teacher and student, which in turn may broaden the notion of literacy so as to include more learning styles.
  • An instructional scaffold is useful for expository writing because of its basis in problem solving, ownership, appropriateness, support, collaboration, and internalization. It allows students to start where they are comfortable, and provides a genre-based structure for organizing creative ideas.
  • In order for students to take ownership of knowledge, they must learn to rework raw information, use details and facts, and write.
  • Teaching writing should involve direct, explicit comprehension instruction, effective instructional principles embedded in content, motivation and self-directed learning, and text-based collaborative learning to improve middle school and high school literacy.

Common Core Standards

This resource has been aligned to the Common Core State Standards for states in which they have been adopted. If a state does not appear in the drop-down, CCSS alignments are forthcoming.

State Standards

This lesson has been aligned to standards in the following states. If a state does not appear in the drop-down, standard alignments are not currently available for that state.

NCTE/IRA National Standards for the English Language Arts

  • 1. Students read a wide range of print and nonprint texts to build an understanding of texts, of themselves, and of the cultures of the United States and the world; to acquire new information; to respond to the needs and demands of society and the workplace; and for personal fulfillment. Among these texts are fiction and nonfiction, classic and contemporary works.
  • 2. Students read a wide range of literature from many periods in many genres to build an understanding of the many dimensions (e.g., philosophical, ethical, aesthetic) of human experience.
  • 3. Students apply a wide range of strategies to comprehend, interpret, evaluate, and appreciate texts. They draw on their prior experience, their interactions with other readers and writers, their knowledge of word meaning and of other texts, their word identification strategies, and their understanding of textual features (e.g., sound-letter correspondence, sentence structure, context, graphics).
  • 4. Students adjust their use of spoken, written, and visual language (e.g., conventions, style, vocabulary) to communicate effectively with a variety of audiences and for different purposes.
  • 5. Students employ a wide range of strategies as they write and use different writing process elements appropriately to communicate with different audiences for a variety of purposes.
  • 6. Students apply knowledge of language structure, language conventions (e.g., spelling and punctuation), media techniques, figurative language, and genre to create, critique, and discuss print and nonprint texts.
  • 7. Students conduct research on issues and interests by generating ideas and questions, and by posing problems. They gather, evaluate, and synthesize data from a variety of sources (e.g., print and nonprint texts, artifacts, people) to communicate their discoveries in ways that suit their purpose and audience.
  • 8. Students use a variety of technological and information resources (e.g., libraries, databases, computer networks, video) to gather and synthesize information and to create and communicate knowledge.
  • 12. Students use spoken, written, and visual language to accomplish their own purposes (e.g., for learning, enjoyment, persuasion, and the exchange of information).

Materials and Technology

Computers with Internet access and printing capability

  • Research Paper Scaffold
  • Example Research Paper Scaffold
  • Example Student Research Paper
  • Internet Citation Checklist
  • Research Paper Scoring Rubric
  • Permission Form (optional)

Preparation

Student objectives.

Students will

  • Formulate a clear thesis that conveys a perspective on the subject of their research
  • Practice research skills, including evaluation of sources, paraphrasing and summarizing relevant information, and citation of sources used
  • Logically group and sequence ideas in expository writing
  • Organize and display information on charts, maps, and graphs

Session 1: Research Question

You should approve students’ final research questions before Session 2. You may also wish to send home the Permission Form with students, to make parents aware of their child’s research topic and the project due dates.

Session 2: Literature Review—Search

Prior to this session, you may want to introduce or review Internet search techniques using the lesson Inquiry on the Internet: Evaluating Web Pages for a Class Collection . You may also wish to consult with the school librarian regarding subscription databases designed specifically for student research, which may be available through the school or public library. Using these types of resources will help to ensure that students find relevant and appropriate information. Using Internet search engines such as Google can be overwhelming to beginning researchers.

Session 3: Literature Review—Notes

Students need to bring their articles to this session. For large classes, have students highlight relevant information (as described below) and submit the articles for assessment before beginning the session.

Checking Literature Review entries on the same day is best practice, as it gives both you and the student time to plan and address any problems before proceeding. Note that in the finished product this literature review section will be about six paragraphs, so students need to gather enough facts to fit this format.

Session 4: Analysis

Session 5: original research.

Students should design some form of original research appropriate to their topics, but they do not necessarily have to conduct the experiments or surveys they propose. Depending on the appropriateness of the original research proposals, the time involved, and the resources available, you may prefer to omit the actual research or use it as an extension activity.

Session 6: Results (optional)

Session 7: conclusion, session 8: references and writing final draft, student assessment / reflections.

  • Observe students’ participation in the initial stages of the Research Paper Scaffold and promptly address any errors or misconceptions about the research process.
  • Observe students and provide feedback as they complete each section of the Research Paper Scaffold.
  • Provide a safe environment where students will want to take risks in exploring ideas. During collaborative work, offer feedback and guidance to those who need encouragement or require assistance in learning cooperation and tolerance.
  • Involve students in using the Research Paper Scoring Rubric for final evaluation of the research paper. Go over this rubric during Session 8, before they write their final drafts.
  • Strategy Guides

Add new comment

  • Print this resource

Explore Resources by Grade

  • Kindergarten K
  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, a critical review of research on student self-assessment.

research paper writing assessment

  • Educational Psychology and Methodology, University at Albany, Albany, NY, United States

This article is a review of research on student self-assessment conducted largely between 2013 and 2018. The purpose of the review is to provide an updated overview of theory and research. The treatment of theory involves articulating a refined definition and operationalization of self-assessment. The review of 76 empirical studies offers a critical perspective on what has been investigated, including the relationship between self-assessment and achievement, consistency of self-assessment and others' assessments, student perceptions of self-assessment, and the association between self-assessment and self-regulated learning. An argument is made for less research on consistency and summative self-assessment, and more on the cognitive and affective mechanisms of formative self-assessment.

This review of research on student self-assessment expands on a review published as a chapter in the Cambridge Handbook of Instructional Feedback ( Andrade, 2018 , reprinted with permission). The timespan for the original review was January 2013 to October 2016. A lot of research has been done on the subject since then, including at least two meta-analyses; hence this expanded review, in which I provide an updated overview of theory and research. The treatment of theory presented here involves articulating a refined definition and operationalization of self-assessment through a lens of feedback. My review of the growing body of empirical research offers a critical perspective, in the interest of provoking new investigations into neglected areas.

Defining and Operationalizing Student Self-Assessment

Without exception, reviews of self-assessment ( Sargeant, 2008 ; Brown and Harris, 2013 ; Panadero et al., 2016a ) call for clearer definitions: What is self-assessment, and what is not? This question is surprisingly difficult to answer, as the term self-assessment has been used to describe a diverse range of activities, such as assigning a happy or sad face to a story just told, estimating the number of correct answers on a math test, graphing scores for dart throwing, indicating understanding (or the lack thereof) of a science concept, using a rubric to identify strengths and weaknesses in one's persuasive essay, writing reflective journal entries, and so on. Each of those activities involves some kind of assessment of one's own functioning, but they are so different that distinctions among types of self-assessment are needed. I will draw those distinctions in terms of the purposes of self-assessment which, in turn, determine its features: a classic form-fits-function analysis.

What is Self-Assessment?

Brown and Harris (2013) defined self-assessment in the K-16 context as a “descriptive and evaluative act carried out by the student concerning his or her own work and academic abilities” (p. 368). Panadero et al. (2016a) defined it as a “wide variety of mechanisms and techniques through which students describe (i.e., assess) and possibly assign merit or worth to (i.e., evaluate) the qualities of their own learning processes and products” (p. 804). Referring to physicians, Epstein et al. (2008) defined “concurrent self-assessment” as “ongoing moment-to-moment self-monitoring” (p. 5). Self-monitoring “refers to the ability to notice our own actions, curiosity to examine the effects of those actions, and willingness to use those observations to improve behavior and thinking in the future” (p. 5). Taken together, these definitions include self-assessment of one's abilities, processes , and products —everything but the kitchen sink. This very broad conception might seem unwieldy, but it works because each object of assessment—competence, process, and product—is subject to the influence of feedback from oneself.

What is missing from each of these definitions, however, is the purpose of the act of self-assessment. Their authors might rightly point out that the purpose is implied, but a formal definition requires us to make it plain: Why do we ask students to self-assess? I have long held that self-assessment is feedback ( Andrade, 2010 ), and that the purpose of feedback is to inform adjustments to processes and products that deepen learning and enhance performance; hence the purpose of self-assessment is to generate feedback that promotes learning and improvements in performance. This learning-oriented purpose of self-assessment implies that it should be formative: if there is no opportunity for adjustment and correction, self-assessment is almost pointless.

Why Self-Assess?

Clarity about the purpose of self-assessment allows us to interpret what otherwise appear to be discordant findings from research, which has produced mixed results in terms of both the accuracy of students' self-assessments and their influence on learning and/or performance. I believe the source of the discord can be traced to the different ways in which self-assessment is carried out, such as whether it is summative and formative. This issue will be taken up again in the review of current research that follows this overview. For now, consider a study of the accuracy and validity of summative self-assessment in teacher education conducted by Tejeiro et al. (2012) , which showed that students' self-assigned marks tended to be higher than marks given by professors. All 122 students in the study assigned themselves a grade at the end of their course, but half of the students were told that their self-assigned grade would count toward 5% of their final grade. In both groups, students' self-assessments were higher than grades given by professors, especially for students with “poorer results” (p. 791) and those for whom self-assessment counted toward the final grade. In the group that was told their self-assessments would count toward their final grade, no relationship was found between the professor's and the students' assessments. Tejeiro et al. concluded that, although students' and professor's assessments tend to be highly similar when self-assessment did not count toward final grades, overestimations increased dramatically when students' self-assessments did count. Interviews of students who self-assigned highly discrepant grades revealed (as you might guess) that they were motivated by the desire to obtain the highest possible grades.

Studies like Tejeiro et al's. (2012) are interesting in terms of the information they provide about the relationship between consistency and honesty, but the purpose of the self-assessment, beyond addressing interesting research questions, is unclear. There is no feedback purpose. This is also true for another example of a study of summative self-assessment of competence, during which elementary-school children took the Test of Narrative Language and then were asked to self-evaluate “how you did in making up stories today” by pointing to one of five pictures, from a “very happy face” (rating of five) to a “very sad face” (rating of one) ( Kaderavek et al., 2004 . p. 37). The usual results were reported: Older children and good narrators were more accurate than younger children and poor narrators, and males tended to more frequently overestimate their ability.

Typical of clinical studies of accuracy in self-evaluation, this study rests on a definition and operationalization of self-assessment with no value in terms of instructional feedback. If those children were asked to rate their stories and then revise or, better yet, if they assessed their stories according to clear, developmentally appropriate criteria before revising, the valence of their self-assessments in terms of instructional feedback would skyrocket. I speculate that their accuracy would too. In contrast, studies of formative self-assessment suggest that when the act of self-assessing is given a learning-oriented purpose, students' self-assessments are relatively consistent with those of external evaluators, including professors ( Lopez and Kossack, 2007 ; Barney et al., 2012 ; Leach, 2012 ), teachers ( Bol et al., 2012 ; Chang et al., 2012 , 2013 ), researchers ( Panadero and Romero, 2014 ; Fitzpatrick and Schulz, 2016 ), and expert medical assessors ( Hawkins et al., 2012 ).

My commitment to keeping self-assessment formative is firm. However, Gavin Brown (personal communication, April 2011) reminded me that summative self-assessment exists and we cannot ignore it; any definition of self-assessment must acknowledge and distinguish between formative and summative forms of it. Thus, the taxonomy in Table 1 , which depicts self-assessment as serving formative and/or summative purposes, and focuses on competence, processes, and/or products.

www.frontiersin.org

Table 1 . A taxonomy of self-assessment.

Fortunately, a formative view of self-assessment seems to be taking hold in various educational contexts. For instance, Sargeant (2008) noted that all seven authors in a special issue of the Journal of Continuing Education in the Health Professions “conceptualize self-assessment within a formative, educational perspective, and see it as an activity that draws upon both external and internal data, standards, and resources to inform and make decisions about one's performance” (p. 1). Sargeant also stresses the point that self-assessment should be guided by evaluative criteria: “Multiple external sources can and should inform self-assessment, perhaps most important among them performance standards” (p. 1). Now we are talking about the how of self-assessment, which demands an operationalization of self-assessment practice. Let us examine each object of self-assessment (competence, processes, and/or products) with an eye for what is assessed and why.

What is Self-Assessed?

Monitoring and self-assessing processes are practically synonymous with self-regulated learning (SRL), or at least central components of it such as goal-setting and monitoring, or metacognition. Research on SRL has clearly shown that self-generated feedback on one's approach to learning is associated with academic gains ( Zimmerman and Schunk, 2011 ). Self-assessment of the products , such as papers and presentations, are the easiest to defend as feedback, especially when those self-assessments are grounded in explicit, relevant, evaluative criteria and followed by opportunities to relearn and/or revise ( Andrade, 2010 ).

Including the self-assessment of competence in this definition is a little trickier. I hesitated to include it because of the risk of sneaking in global assessments of one's overall ability, self-esteem, and self-concept (“I'm good enough, I'm smart enough, and doggone it, people like me,” Franken, 1992 ), which do not seem relevant to a discussion of feedback in the context of learning. Research on global self-assessment, or self-perception, is popular in the medical education literature, but even there, scholars have begun to question its usefulness in terms of influencing learning and professional growth (e.g., see Sargeant et al., 2008 ). Eva and Regehr (2008) seem to agree in the following passage, which states the case in a way that makes it worthy of a long quotation:

Self-assessment is often (implicitly or otherwise) conceptualized as a personal, unguided reflection on performance for the purposes of generating an individually derived summary of one's own level of knowledge, skill, and understanding in a particular area. For example, this conceptualization would appear to be the only reasonable basis for studies that fit into what Colliver et al. (2005) has described as the “guess your grade” model of self-assessment research, the results of which form the core foundation for the recurring conclusion that self-assessment is generally poor. This unguided, internally generated construction of self-assessment stands in stark contrast to the model put forward by Boud (1999) , who argued that the phrase self-assessment should not imply an isolated or individualistic activity; it should commonly involve peers, teachers, and other sources of information. The conceptualization of self-assessment as enunciated in Boud's description would appear to involve a process by which one takes personal responsibility for looking outward, explicitly seeking feedback, and information from external sources, then using these externally generated sources of assessment data to direct performance improvements. In this construction, self-assessment is more of a pedagogical strategy than an ability to judge for oneself; it is a habit that one needs to acquire and enact rather than an ability that one needs to master (p. 15).

As in the K-16 context, self-assessment is coming to be seen as having value as much or more so in terms of pedagogy as in assessment ( Silver et al., 2008 ; Brown and Harris, 2014 ). In the end, however, I decided that self-assessing one's competence to successfully learn a particular concept or complete a particular task (which sounds a lot like self-efficacy—more on that later) might be useful feedback because it can inform decisions about how to proceed, such as the amount of time to invest in learning how to play the flute, or whether or not to seek help learning the steps of the jitterbug. An important caveat, however, is that self-assessments of competence are only useful if students have opportunities to do something about their perceived low competence—that is, it serves the purpose of formative feedback for the learner.

How to Self-Assess?

Panadero et al. (2016a) summarized five very different taxonomies of self-assessment and called for the development of a comprehensive typology that considers, among other things, its purpose, the presence or absence of criteria, and the method. In response, I propose the taxonomy depicted in Table 1 , which focuses on the what (competence, process, or product), the why (formative or summative), and the how (methods, including whether or not they include standards, e.g., criteria) of self-assessment. The collections of examples of methods in the table is inexhaustive.

I put the methods in Table 1 where I think they belong, but many of them could be placed in more than one cell. Take self-efficacy , for instance, which is essentially a self-assessment of one's competence to successfully undertake a particular task ( Bandura, 1997 ). Summative judgments of self-efficacy are certainly possible but they seem like a silly thing to do—what is the point, from a learning perspective? Formative self-efficacy judgments, on the other hand, can inform next steps in learning and skill building. There is reason to believe that monitoring and making adjustments to one's self-efficacy (e.g., by setting goals or attributing success to effort) can be productive ( Zimmerman, 2000 ), so I placed self-efficacy in the formative row.

It is important to emphasize that self-efficacy is task-specific, more or less ( Bandura, 1997 ). This taxonomy does not include general, holistic evaluations of one's abilities, for example, “I am good at math.” Global assessment of competence does not provide the leverage, in terms of feedback, that is provided by task-specific assessments of competence, that is, self-efficacy. Eva and Regehr (2008) provided an illustrative example: “We suspect most people are prompted to open a dictionary as a result of encountering a word for which they are uncertain of the meaning rather than out of a broader assessment that their vocabulary could be improved” (p. 16). The exclusion of global evaluations of oneself resonates with research that clearly shows that feedback that focuses on aspects of a task (e.g., “I did not solve most of the algebra problems”) is more effective than feedback that focuses on the self (e.g., “I am bad at math”) ( Kluger and DeNisi, 1996 ; Dweck, 2006 ; Hattie and Timperley, 2007 ). Hence, global self-evaluations of ability or competence do not appear in Table 1 .

Another approach to student self-assessment that could be placed in more than one cell is traffic lights . The term traffic lights refers to asking students to use green, yellow, or red objects (or thumbs up, sideways, or down—anything will do) to indicate whether they think they have good, partial, or little understanding ( Black et al., 2003 ). It would be appropriate for traffic lights to appear in multiple places in Table 1 , depending on how they are used. Traffic lights seem to be most effective at supporting students' reflections on how well they understand a concept or have mastered a skill, which is line with their creators' original intent, so they are categorized as formative self-assessments of one's learning—which sounds like metacognition.

In fact, several of the methods included in Table 1 come from research on metacognition, including self-monitoring , such as checking one's reading comprehension, and self-testing , e.g., checking one's performance on test items. These last two methods have been excluded from some taxonomies of self-assessment (e.g., Boud and Brew, 1995 ) because they do not engage students in explicitly considering relevant standards or criteria. However, new conceptions of self-assessment are grounded in theories of the self- and co-regulation of learning ( Andrade and Brookhart, 2016 ), which includes self-monitoring of learning processes with and without explicit standards.

However, my research favors self-assessment with regard to standards ( Andrade and Boulay, 2003 ; Andrade and Du, 2007 ; Andrade et al., 2008 , 2009 , 2010 ), as does related research by Panadero and his colleagues (see below). I have involved students in self-assessment of stories, essays, or mathematical word problems according to rubrics or checklists with criteria. For example, two studies investigated the relationship between elementary or middle school students' scores on a written assignment and a process that involved them in reading a model paper, co-creating criteria, self-assessing first drafts with a rubric, and revising ( Andrade et al., 2008 , 2010 ). The self-assessment was highly scaffolded: students were asked to underline key phrases in the rubric with colored pencils (e.g., underline “clearly states an opinion” in blue), then underline or circle in their drafts the evidence of having met the standard articulated by the phrase (e.g., his or her opinion) with the same blue pencil. If students found they had not met the standard, they were asked to write themselves a reminder to make improvements when they wrote their final drafts. This process was followed for each criterion on the rubric. There were main effects on scores for every self-assessed criterion on the rubric, suggesting that guided self-assessment according to the co-created criteria helped students produce more effective writing.

Panadero and his colleagues have also done quasi-experimental and experimental research on standards-referenced self-assessment, using rubrics or lists of assessment criteria that are presented in the form of questions ( Panadero et al., 2012 , 2013 , 2014 ; Panadero and Romero, 2014 ). Panadero calls the list of assessment criteria a script because his work is grounded in research on scaffolding (e.g., Kollar et al., 2006 ): I call it a checklist because that is the term used in classroom assessment contexts. Either way, the list provides standards for the task. Here is a script for a written summary that Panadero et al. (2014) used with college students in a psychology class:

• Does my summary transmit the main idea from the text? Is it at the beginning of my summary?

• Are the important ideas also in my summary?

• Have I selected the main ideas from the text to make them explicit in my summary?

• Have I thought about my purpose for the summary? What is my goal?

Taken together, the results of the studies cited above suggest that students who engaged in self-assessment using scripts or rubrics were more self-regulated, as measured by self-report questionnaires and/or think aloud protocols, than were students in the comparison or control groups. Effect sizes were very small to moderate (η 2 = 0.06–0.42), and statistically significant. Most interesting, perhaps, is one study ( Panadero and Romero, 2014 ) that demonstrated an association between rubric-referenced self-assessment activities and all three phases of SRL; forethought, performance, and reflection.

There are surely many other methods of self-assessment to include in Table 1 , as well as interesting conversations to be had about which method goes where and why. In the meantime, I offer the taxonomy in Table 1 as a way to define and operationalize self-assessment in instructional contexts and as a framework for the following overview of current research on the subject.

An Overview of Current Research on Self-Assessment

Several recent reviews of self-assessment are available ( Brown and Harris, 2013 ; Brown et al., 2015 ; Panadero et al., 2017 ), so I will not summarize the entire body of research here. Instead, I chose to take a birds-eye view of the field, with goal of reporting on what has been sufficiently researched and what remains to be done. I used the references lists from reviews, as well as other relevant sources, as a starting point. In order to update the list of sources, I directed two new searches 1 , the first of the ERIC database, and the second of both ERIC and PsychINFO. Both searches included two search terms, “self-assessment” OR “self-evaluation.” Advanced search options had four delimiters: (1) peer-reviewed, (2) January, 2013–October, 2016 and then October 2016–March 2019, (3) English, and (4) full-text. Because the focus was on K-20 educational contexts, sources were excluded if they were about early childhood education or professional development.

The first search yielded 347 hits; the second 1,163. Research that was unrelated to instructional feedback was excluded, such as studies limited to self-estimates of performance before or after taking a test, guesses about whether a test item was answered correctly, and estimates of how many tasks could be completed in a certain amount of time. Although some of the excluded studies might be thought of as useful investigations of self-monitoring, as a group they seemed too unrelated to theories of self-generated feedback to be appropriate for this review. Seventy-six studies were selected for inclusion in Table S1 (Supplementary Material), which also contains a few studies published before 2013 that were not included in key reviews, as well as studies solicited directly from authors.

The Table S1 in the Supplementary Material contains a complete list of studies included in this review, organized by the focus or topic of the study, as well as brief descriptions of each. The “type” column Table S1 (Supplementary Material) indicates whether the study focused on formative or summative self-assessment. This distinction was often difficult to make due to a lack of information. For example, Memis and Seven (2015) frame their study in terms of formative assessment, and note that the purpose of the self-evaluation done by the sixth grade students is to “help students improve their [science] reports” (p. 39), but they do not indicate how the self-assessments were done, nor whether students were given time to revise their reports based on their judgments or supported in making revisions. A sentence or two of explanation about the process of self-assessment in the procedures sections of published studies would be most useful.

Figure 1 graphically represents the number of studies in the four most common topic categories found in the table—achievement, consistency, student perceptions, and SRL. The figure reveals that research on self-assessment is on the rise, with consistency the most popular topic. Of the 76 studies in the table in the appendix, 44 were inquiries into the consistency of students' self-assessments with other judgments (e.g., a test score or teacher's grade). Twenty-five studies investigated the relationship between self-assessment and achievement. Fifteen explored students' perceptions of self-assessment. Twelve studies focused on the association between self-assessment and self-regulated learning. One examined self-efficacy, and two qualitative studies documented the mental processes involved in self-assessment. The sum ( n = 99) of the list of research topics is more than 76 because several studies had multiple foci. In the remainder of this review I examine each topic in turn.

www.frontiersin.org

Figure 1 . Topics of self-assessment studies, 2013–2018.

Consistency

Table S1 (Supplementary Material) reveals that much of the recent research on self-assessment has investigated the accuracy or, more accurately, consistency, of students' self-assessments. The term consistency is more appropriate in the classroom context because the quality of students' self-assessments is often determined by comparing them with their teachers' assessments and then generating correlations. Given the evidence of the unreliability of teachers' grades ( Falchikov, 2005 ), the assumption that teachers' assessments are accurate might not be well-founded ( Leach, 2012 ; Brown et al., 2015 ). Ratings of student work done by researchers are also suspect, unless evidence of the validity and reliability of the inferences made about student work by researchers is available. Consequently, much of the research on classroom-based self-assessment should use the term consistency , which refers to the degree of alignment between students' and expert raters' evaluations, avoiding the purer, more rigorous term accuracy unless it is fitting.

In their review, Brown and Harris (2013) reported that correlations between student self-ratings and other measures tended to be weakly to strongly positive, ranging from r ≈ 0.20 to 0.80, with few studies reporting correlations >0.60. But their review included results from studies of any self-appraisal of school work, including summative self-rating/grading, predictions about the correctness of answers on test items, and formative, criteria-based self-assessments, a combination of methods that makes the correlations they reported difficult to interpret. Qualitatively different forms of self-assessment, especially summative and formative types, cannot be lumped together without obfuscating important aspects of self-assessment as feedback.

Given my concern about combining studies of summative and formative assessment, you might anticipate a call for research on consistency that distinguishes between the two. I will make no such call for three reasons. One is that we have enough research on the subject, including the 22 studies in Table S1 (Supplementary Material) that were published after Brown and Harris's review (2013 ). Drawing only on studies included in Table S1 (Supplementary Material), we can say with confidence that summative self-assessment tends to be inconsistent with external judgements ( Baxter and Norman, 2011 ; De Grez et al., 2012 ; Admiraal et al., 2015 ), with males tending to overrate and females to underrate ( Nowell and Alston, 2007 ; Marks et al., 2018 ). There are exceptions ( Alaoutinen, 2012 ; Lopez-Pastor et al., 2012 ) as well as mixed results, with students being consistent regarding some aspects of their learning but not others ( Blanch-Hartigan, 2011 ; Harding and Hbaci, 2015 ; Nguyen and Foster, 2018 ). We can also say that older, more academically competent learners tend to be more consistent ( Hacker et al., 2000 ; Lew et al., 2010 ; Alaoutinen, 2012 ; Guillory and Blankson, 2017 ; Butler, 2018 ; Nagel and Lindsey, 2018 ). There is evidence that consistency can be improved through experience ( Lopez and Kossack, 2007 ; Yilmaz, 2017 ; Nagel and Lindsey, 2018 ), the use of guidelines ( Bol et al., 2012 ), feedback ( Thawabieh, 2017 ), and standards ( Baars et al., 2014 ), perhaps in the form of rubrics ( Panadero and Romero, 2014 ). Modeling and feedback also help ( Labuhn et al., 2010 ; Miller and Geraci, 2011 ; Hawkins et al., 2012 ; Kostons et al., 2012 ).

An outcome typical of research on the consistency of summative self-assessment can be found in row 59, which summarizes the study by Tejeiro et al. (2012) discussed earlier: Students' self-assessments were higher than marks given by professors, especially for students with poorer results, and no relationship was found between the professors' and the students' assessments in the group in which self-assessment counted toward the final mark. Students are not stupid: if they know that they can influence their final grade, and that their judgment is summative rather than intended to inform revision and improvement, they will be motivated to inflate their self-evaluation. I do not believe we need more research to demonstrate that phenomenon.

The second reason I am not calling for additional research on consistency is a lot of it seems somewhat irrelevant. This might be because the interest in accuracy is rooted in clinical research on calibration, which has very different aims. Calibration accuracy is the “magnitude of consent between learners' true and self-evaluated task performance. Accurately calibrated learners' task performance equals their self-evaluated task performance” ( Wollenschläger et al., 2016 ). Calibration research often asks study participants to predict or postdict the correctness of their responses to test items. I caution about generalizing from clinical experiments to authentic classroom contexts because the dismal picture of our human potential to self-judge was painted by calibration researchers before study participants were effectively taught how to predict with accuracy, or provided with the tools they needed to be accurate, or motivated to do so. Calibration researchers know that, of course, and have conducted intervention studies that attempt to improve accuracy, with some success (e.g., Bol et al., 2012 ). Studies of formative self-assessment also suggest that consistency increases when it is taught and supported in many of the ways any other skill must be taught and supported ( Lopez and Kossack, 2007 ; Labuhn et al., 2010 ; Chang et al., 2012 , 2013 ; Hawkins et al., 2012 ; Panadero and Romero, 2014 ; Lin-Siegler et al., 2015 ; Fitzpatrick and Schulz, 2016 ).

Even clinical psychological studies that go beyond calibration to examine the associations between monitoring accuracy and subsequent study behaviors do not transfer well to classroom assessment research. After repeatedly encountering claims that, for example, low self-assessment accuracy leads to poor task-selection accuracy and “suboptimal learning outcomes” ( Raaijmakers et al., 2019 , p. 1), I dug into the cited studies and discovered two limitations. The first is that the tasks in which study participants engage are quite inauthentic. A typical task involves studying “word pairs (e.g., railroad—mother), followed by a delayed judgment of learning (JOL) in which the students predicted the chances of remembering the pair… After making a JOL, the entire pair was presented for restudy for 4 s [ sic ], and after all pairs had been restudied, a criterion test of paired-associate recall occurred” ( Dunlosky and Rawson, 2012 , p. 272). Although memory for word pairs might be important in some classroom contexts, it is not safe to assume that results from studies like that one can predict students' behaviors after criterion-referenced self-assessment of their comprehension of complex texts, lengthy compositions, or solutions to multi-step mathematical problems.

The second limitation of studies like the typical one described above is more serious: Participants in research like that are not permitted to regulate their own studying, which is experimentally manipulated by a computer program. This came as a surprise, since many of the claims were about students' poor study choices but they were rarely allowed to make actual choices. For example, Dunlosky and Rawson (2012) permitted participants to “use monitoring to effectively control learning” by programming the computer so that “a participant would need to have judged his or her recall of a definition entirely correct on three different trials, and once they judged it entirely correct on the third trial, that particular key term definition was dropped [by the computer program] from further practice” (p. 272). The authors note that this study design is an improvement on designs that did not require all participants to use the same regulation algorithm, but it does not reflect the kinds of decisions that learners make in class or while doing homework. In fact, a large body of research shows that students can make wise choices when they self-pace the study of to-be-learned materials and then allocate study time to each item ( Bjork et al., 2013 , p. 425):

In a typical experiment, the students first study all the items at an experimenter-paced rate (e.g., study 60 paired associates for 3 s each), which familiarizes the students with the items; after this familiarity phase, the students then either choose which items they want to restudy (e.g., all items are presented in an array, and the students select which ones to restudy) and/or pace their restudy of each item. Several dependent measures have been widely used, such as how long each item is studied, whether an item is selected for restudy, and in what order items are selected for restudy. The literature on these aspects of self-regulated study is massive (for a comprehensive overview, see both Dunlosky and Ariel, 2011 and Son and Metcalfe, 2000 ), but the evidence is largely consistent with a few basic conclusions. First, if students have a chance to practice retrieval prior to restudying items, they almost exclusively choose to restudy unrecalled items and drop the previously recalled items from restudy ( Metcalfe and Kornell, 2005 ). Second, when pacing their study of individual items that have been selected for restudy, students typically spend more time studying items that are more, rather than less, difficult to learn. Such a strategy is consistent with a discrepancy-reduction model of self-paced study (which states that people continue to study an item until they reach mastery), although some key revisions to this model are needed to account for all the data. For instance, students may not continue to study until they reach some static criterion of mastery, but instead, they may continue to study until they perceive that they are no longer making progress.

I propose that this research, which suggests that students' unscaffolded, unmeasured, informal self-assessments tend to lead to appropriate task selection, is better aligned with research on classroom-based self-assessment. Nonetheless, even this comparison is inadequate because the study participants were not taught to compare their performance to the criteria for mastery, as is often done in classroom-based self-assessment.

The third and final reason I do not believe we need additional research on consistency is that I think it is a distraction from the true purposes of self-assessment. Many if not most of the articles about the accuracy of self-assessment are grounded in the assumption that accuracy is necessary for self-assessment to be useful, particularly in terms of subsequent studying and revision behaviors. Although it seems obvious that accurate evaluations of their performance positively influence students' study strategy selection, which should produce improvements in achievement, I have not seen relevant research that tests those conjectures. Some claim that inaccurate estimates of learning lead to the selection of inappropriate learning tasks ( Kostons et al., 2012 ) but they cite research that does not support their claim. For example, Kostons et al. cite studies that focus on the effectiveness of SRL interventions but do not address the accuracy of participants' estimates of learning, nor the relationship of those estimates to the selection of next steps. Other studies produce findings that support my skepticism. Take, for instance, two relevant studies of calibration. One suggested that performance and judgments of performance had little influence on subsequent test preparation behavior ( Hacker et al., 2000 ), and the other showed that study participants followed their predictions of performance to the same degree, regardless of monitoring accuracy ( van Loon et al., 2014 ).

Eva and Regehr (2008) believe that:

Research questions that take the form of “How well do various practitioners self-assess?” “How can we improve self-assessment?” or “How can we measure self-assessment skill?” should be considered defunct and removed from the research agenda [because] there have been hundreds of studies into these questions and the answers are “Poorly,” “You can't,” and “Don't bother” (p. 18).

I almost agree. A study that could change my mind about the importance of accuracy of self-assessment would be an investigation that goes beyond attempting to improve accuracy just for the sake of accuracy by instead examining the relearning/revision behaviors of accurate and inaccurate self-assessors: Do students whose self-assessments match the valid and reliable judgments of expert raters (hence my use of the term accuracy ) make better decisions about what they need to do to deepen their learning and improve their work? Here, I admit, is a call for research related to consistency: I would love to see a high-quality investigation of the relationship between accuracy in formative self-assessment, and students' subsequent study and revision behaviors, and their learning. For example, a study that closely examines the revisions to writing made by accurate and inaccurate self-assessors, and the resulting outcomes in terms of the quality of their writing, would be most welcome.

Table S1 (Supplementary Material) indicates that by 2018 researchers began publishing studies that more directly address the hypothesized link between self-assessment and subsequent learning behaviors, as well as important questions about the processes learners engage in while self-assessing ( Yan and Brown, 2017 ). One, a study by Nugteren et al. (2018 row 19 in Table S1 (Supplementary Material)), asked “How do inaccurate [summative] self-assessments influence task selections?” (p. 368) and employed a clever exploratory research design. The results suggested that most of the 15 students in their sample over-estimated their performance and made inaccurate learning-task selections. Nugteren et al. recommended helping students make more accurate self-assessments, but I think the more interesting finding is related to why students made task selections that were too difficult or too easy, given their prior performance: They based most task selections on interest in the content of particular items (not the overarching content to be learned), and infrequently considered task difficulty and support level. For instance, while working on the genetics tasks, students reported selecting tasks because they were fun or interesting, not because they addressed self-identified weaknesses in their understanding of genetics. Nugteren et al. proposed that students would benefit from instruction on task selection. I second that proposal: Rather than directing our efforts on accuracy in the service of improving subsequent task selection, let us simply teach students to use the information at hand to select next best steps, among other things.

Butler (2018 , row 76 in Table S1 (Supplementary Material)) has conducted at least two studies of learners' processes of responding to self-assessment items and how they arrived at their judgments. Comparing generic, decontextualized items to task-specific, contextualized items (which she calls after-task items ), she drew two unsurprising conclusions: the task-specific items “generally showed higher correlations with task performance,” and older students “appeared to be more conservative in their judgment compared with their younger counterparts” (p. 249). The contribution of the study is the detailed information it provides about how students generated their judgments. For example, Butler's qualitative data analyses revealed that when asked to self-assess in terms of vague or non-specific items, the children often “contextualized the descriptions based on their own experiences, goals, and expectations,” (p. 257) focused on the task at hand, and situated items in the specific task context. Perhaps as a result, the correlation between after-task self-assessment and task performance was generally higher than for generic self-assessment.

Butler (2018) notes that her study enriches our empirical understanding of the processes by which children respond to self-assessment. This is a very promising direction for the field. Similar studies of processing during formative self-assessment of a variety of task types in a classroom context would likely produce significant advances in our understanding of how and why self-assessment influences learning and performance.

Student Perceptions

Fifteen of the studies listed in Table S1 (Supplementary Material) focused on students' perceptions of self-assessment. The studies of children suggest that they tend to have unsophisticated understandings of its purposes ( Harris and Brown, 2013 ; Bourke, 2016 ) that might lead to shallow implementation of related processes. In contrast, results from the studies conducted in higher education settings suggested that college and university students understood the function of self-assessment ( Ratminingsih et al., 2018 ) and generally found it to be useful for guiding evaluation and revision ( Micán and Medina, 2017 ), understanding how to take responsibility for learning ( Lopez and Kossack, 2007 ; Bourke, 2014 ; Ndoye, 2017 ), prompting them to think more critically and deeply ( van Helvoort, 2012 ; Siow, 2015 ), applying newfound skills ( Murakami et al., 2012 ), and fostering self-regulated learning by guiding them to set goals, plan, self-monitor and reflect ( Wang, 2017 ).

Not surprisingly, positive perceptions of self-assessment were typically developed by students who actively engaged the formative type by, for example, developing their own criteria for an effective self-assessment response ( Bourke, 2014 ), or using a rubric or checklist to guide their assessments and then revising their work ( Huang and Gui, 2015 ; Wang, 2017 ). Earlier research suggested that children's attitudes toward self-assessment can become negative if it is summative ( Ross et al., 1998 ). However, even summative self-assessment was reported by adult learners to be useful in helping them become more critical of their own and others' writing throughout the course and in subsequent courses ( van Helvoort, 2012 ).

Achievement

Twenty-five of the studies in Table S1 (Supplementary Material) investigated the relation between self-assessment and achievement, including two meta-analyses. Twenty of the 25 clearly employed the formative type. Without exception, those 20 studies, plus the two meta-analyses ( Graham et al., 2015 ; Sanchez et al., 2017 ) demonstrated a positive association between self-assessment and learning. The meta-analysis conducted by Graham and his colleagues, which included 10 studies, yielded an average weighted effect size of 0.62 on writing quality. The Sanchez et al. meta-analysis revealed that, although 12 of the 44 effect sizes were negative, on average, “students who engaged in self-grading performed better ( g = 0.34) on subsequent tests than did students who did not” (p. 1,049).

All but two of the non-meta-analytic studies of achievement in Table S1 (Supplementary Material) were quasi-experimental or experimental, providing relatively rigorous evidence that their treatment groups outperformed their comparison or control groups in terms of everything from writing to dart-throwing, map-making, speaking English, and exams in a wide variety of disciplines. One experiment on summative self-assessment ( Miller and Geraci, 2011 ), in contrast, resulted in no improvements in exam scores, while the other one did ( Raaijmakers et al., 2017 ).

It would be easy to overgeneralize and claim that the question about the effect of self-assessment on learning has been answered, but there are unanswered questions about the key components of effective self-assessment, especially social-emotional components related to power and trust ( Andrade and Brown, 2016 ). The trends are pretty clear, however: it appears that formative forms of self-assessment can promote knowledge and skill development. This is not surprising, given that it involves many of the processes known to support learning, including practice, feedback, revision, and especially the intellectually demanding work of making complex, criteria-referenced judgments ( Panadero et al., 2014 ). Boud (1995a , b) predicted this trend when he noted that many self-assessment processes undermine learning by rushing to judgment, thereby failing to engage students with the standards or criteria for their work.

Self-Regulated Learning

The association between self-assessment and learning has also been explained in terms of self-regulation ( Andrade, 2010 ; Panadero and Alonso-Tapia, 2013 ; Andrade and Brookhart, 2016 , 2019 ; Panadero et al., 2016b ). Self-regulated learning (SRL) occurs when learners set goals and then monitor and manage their thoughts, feelings, and actions to reach those goals. SRL is moderately to highly correlated with achievement ( Zimmerman and Schunk, 2011 ). Research suggests that formative assessment is a potential influence on SRL ( Nicol and Macfarlane-Dick, 2006 ). The 12 studies in Table S1 (Supplementary Material) that focus on SRL demonstrate the recent increase in interest in the relationship between self-assessment and SRL.

Conceptual and practical overlaps between the two fields are abundant. In fact, Brown and Harris (2014) recommend that student self-assessment no longer be treated as an assessment, but as an essential competence for self-regulation. Butler and Winne (1995) introduced the role of self-generated feedback in self-regulation years ago:

[For] all self-regulated activities, feedback is an inherent catalyst. As learners monitor their engagement with tasks, internal feedback is generated by the monitoring process. That feedback describes the nature of outcomes and the qualities of the cognitive processes that led to those states (p. 245).

The outcomes and processes referred to by Butler and Winne are many of the same products and processes I referred to earlier in the definition of self-assessment and in Table 1 .

In general, research and practice related to self-assessment has tended to focus on judging the products of student learning, while scholarship on self-regulated learning encompasses both processes and products. The very practical focus of much of the research on self-assessment means it might be playing catch-up, in terms of theory development, with the SRL literature, which is grounded in experimental paradigms from cognitive psychology ( de Bruin and van Gog, 2012 ), while self-assessment research is ahead in terms of implementation (E. Panadero, personal communication, October 21, 2016). One major exception is the work done on Self-regulated Strategy Development ( Glaser and Brunstein, 2007 ; Harris et al., 2008 ), which has successfully integrated SRL research with classroom practices, including self-assessment, to teach writing to students with special needs.

Nicol and Macfarlane-Dick (2006) have been explicit about the potential for self-assessment practices to support self-regulated learning:

To develop systematically the learner's capacity for self-regulation, teachers need to create more structured opportunities for self-monitoring and the judging of progression to goals. Self-assessment tasks are an effective way of achieving this, as are activities that encourage reflection on learning progress (p. 207).

The studies of SRL in Table S1 (Supplementary Material) provide encouraging findings regarding the potential role of self-assessment in promoting achievement, self-regulated learning in general, and metacognition and study strategies related to task selection in particular. The studies also represent a solution to the “methodological and theoretical challenges involved in bringing metacognitive research to the real world, using meaningful learning materials” ( Koriat, 2012 , p. 296).

Future Directions for Research

I agree with ( Yan and Brown, 2017 ) statement that “from a pedagogical perspective, the benefits of self-assessment may come from active engagement in the learning process, rather than by being “veridical” or coinciding with reality, because students' reflection and metacognitive monitoring lead to improved learning” (p. 1,248). Future research should focus less on accuracy/consistency/veridicality, and more on the precise mechanisms of self-assessment ( Butler, 2018 ).

An important aspect of research on self-assessment that is not explicitly represented in Table S1 (Supplementary Material) is practice, or pedagogy: Under what conditions does self-assessment work best, and how are those conditions influenced by context? Fortunately, the studies listed in the table, as well as others (see especially Andrade and Valtcheva, 2009 ; Nielsen, 2014 ; Panadero et al., 2016a ), point toward an answer. But we still have questions about how best to scaffold effective formative self-assessment. One area of inquiry is about the characteristics of the task being assessed, and the standards or criteria used by learners during self-assessment.

Influence of Types of Tasks and Standards or Criteria

Type of task or competency assessed seems to matter (e.g., Dolosic, 2018 , Nguyen and Foster, 2018 ), as do the criteria ( Yilmaz, 2017 ), but we do not yet have a comprehensive understanding of how or why. There is some evidence that it is important that the criteria used to self-assess are concrete, task-specific ( Butler, 2018 ), and graduated. For example, Fastre et al. (2010) revealed an association between self-assessment according to task-specific criteria and task performance: In a quasi-experimental study of 39 novice vocational education students studying stoma care, they compared concrete, task-specific criteria (“performance-based criteria”) such as “Introduces herself to the patient” and “Consults the care file for details concerning the stoma” to vaguer, “competence-based criteria” such as “Shows interest, listens actively, shows empathy to the patient” and “Is discrete with sensitive topics.” The performance-based criteria group outperformed the competence-based group on tests of task performance, presumably because “performance-based criteria make it easier to distinguish levels of performance, enabling a step-by-step process of performance improvement” (p. 530).

This finding echoes the results of a study of self-regulated learning by Kitsantas and Zimmerman (2006) , who argued that “fine-grained standards can have two key benefits: They can enable learners to be more sensitive to small changes in skill and make more appropriate adaptations in learning strategies” (p. 203). In their study, 70 college students were taught how to throw darts at a target. The purpose of the study was to examine the role of graphing of self-recorded outcomes and self-evaluative standards in learning a motor skill. Students who were provided with graduated self-evaluative standards surpassed “those who were provided with absolute standards or no standards (control) in both motor skill and in motivational beliefs (i.e., self-efficacy, attributions, and self-satisfaction)” (p. 201). Kitsantas and Zimmerman hypothesized that setting high absolute standards would limit a learner's sensitivity to small improvements in functioning. This hypothesis was supported by the finding that students who set absolute standards reported significantly less awareness of learning progress (and hit the bull's-eye less often) than students who set graduated standards. “The correlation between the self-evaluation and dart-throwing outcomes measures was extraordinarily high ( r = 0.94)” (p. 210). Classroom-based research on specific, graduated self-assessment criteria would be informative.

Cognitive and Affective Mechanisms of Self-Assessment

There are many additional questions about pedagogy, such as the hoped-for investigation mentioned above of the relationship between accuracy in formative self-assessment, students' subsequent study behaviors, and their learning. There is also a need for research on how to help teachers give students a central role in their learning by creating space for self-assessment (e.g., see Hawe and Parr, 2014 ), and the complex power dynamics involved in doing so ( Tan, 2004 , 2009 ; Taras, 2008 ; Leach, 2012 ). However, there is an even more pressing need for investigations into the internal mechanisms experienced by students engaged in assessing their own learning. Angela Lui and I call this the next black box ( Lui, 2017 ).

Black and Wiliam (1998) used the term black box to emphasize the fact that what happened in most classrooms was largely unknown: all we knew was that some inputs (e.g., teachers, resources, standards, and requirements) were fed into the box, and that certain outputs (e.g., more knowledgeable and competent students, acceptable levels of achievement) would follow. But what, they asked, is happening inside, and what new inputs will produce better outputs? Black and Wiliam's review spawned a great deal of research on formative assessment, some but not all of which suggests a positive relationship with academic achievement ( Bennett, 2011 ; Kingston and Nash, 2011 ). To better understand why and how the use of formative assessment in general and self-assessment in particular is associated with improvements in academic achievement in some instances but not others, we need research that looks into the next black box: the cognitive and affective mechanisms of students who are engaged in assessment processes ( Lui, 2017 ).

The role of internal mechanisms has been discussed in theory but not yet fully tested. Crooks (1988) argued that the impact of assessment is influenced by students' interpretation of the tasks and results, and Butler and Winne (1995) theorized that both cognitive and affective processes play a role in determining how feedback is internalized and used to self-regulate learning. Other theoretical frameworks about the internal processes of receiving and responding to feedback have been developed (e.g., Nicol and Macfarlane-Dick, 2006 ; Draper, 2009 ; Andrade, 2013 ; Lipnevich et al., 2016 ). Yet, Shute (2008) noted in her review of the literature on formative feedback that “despite the plethora of research on the topic, the specific mechanisms relating feedback to learning are still mostly murky, with very few (if any) general conclusions” (p. 156). This area is ripe for research.

Self-assessment is the act of monitoring one's processes and products in order to make adjustments that deepen learning and enhance performance. Although it can be summative, the evidence presented in this review strongly suggests that self-assessment is most beneficial, in terms of both achievement and self-regulated learning, when it is used formatively and supported by training.

What is not yet clear is why and how self-assessment works. Those of you who like to investigate phenomena that are maddeningly difficult to measure will rejoice to hear that the cognitive and affective mechanisms of self-assessment are the next black box. Studies of the ways in which learners think and feel, the interactions between their thoughts and feelings and their context, and the implications for pedagogy will make major contributions to our field.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2019.00087/full#supplementary-material

1. ^ I am grateful to my graduate assistants, Joanna Weaver and Taja Young, for conducting the searches.

Admiraal, W., Huisman, B., and Pilli, O. (2015). Assessment in massive open online courses. Electron. J. e-Learning , 13, 207–216.

Google Scholar

Alaoutinen, S. (2012). Evaluating the effect of learning style and student background on self-assessment accuracy. Comput. Sci. Educ. 22, 175–198. doi: 10.1080/08993408.2012.692924

CrossRef Full Text | Google Scholar

Al-Rawahi, N. M., and Al-Balushi, S. M. (2015). The effect of reflective science journal writing on students' self-regulated learning strategies. Int. J. Environ. Sci. Educ. 10, 367–379. doi: 10.12973/ijese.2015.250a

Andrade, H. (2010). “Students as the definitive source of formative assessment: academic self-assessment and the self-regulation of learning,” in Handbook of Formative Assessment , eds H. Andrade and G. Cizek (New York, NY: Routledge, 90–105.

Andrade, H. (2013). “Classroom assessment in the context of learning theory and research,” in Sage Handbook of Research on Classroom Assessment , ed J. H. McMillan (New York, NY: Sage), 17–34. doi: 10.4135/9781452218649.n2

Andrade, H. (2018). “Feedback in the context of self-assessment,” in Cambridge Handbook of Instructional Feedback , eds A. Lipnevich and J. Smith (Cambridge: Cambridge University Press), 376–408.

PubMed Abstract

Andrade, H., and Boulay, B. (2003). The role of rubric-referenced self-assessment in learning to write. J. Educ. Res. 97, 21–34. doi: 10.1080/00220670309596625

Andrade, H., and Brookhart, S. (2019). Classroom assessment as the co-regulation of learning. Assessm. Educ. Principles Policy Pract. doi: 10.1080/0969594X.2019.1571992

Andrade, H., and Brookhart, S. M. (2016). “The role of classroom assessment in supporting self-regulated learning,” in Assessment for Learning: Meeting the Challenge of Implementation , eds D. Laveault and L. Allal (Heidelberg: Springer), 293–309. doi: 10.1007/978-3-319-39211-0_17

Andrade, H., and Du, Y. (2007). Student responses to criteria-referenced self-assessment. Assess. Evalu. High. Educ. 32, 159–181. doi: 10.1080/02602930600801928

Andrade, H., Du, Y., and Mycek, K. (2010). Rubric-referenced self-assessment and middle school students' writing. Assess. Educ. 17, 199–214. doi: 10.1080/09695941003696172

Andrade, H., Du, Y., and Wang, X. (2008). Putting rubrics to the test: The effect of a model, criteria generation, and rubric-referenced self-assessment on elementary school students' writing. Educ. Meas. 27, 3–13. doi: 10.1111/j.1745-3992.2008.00118.x

Andrade, H., and Valtcheva, A. (2009). Promoting learning and achievement through self- assessment. Theory Pract. 48, 12–19. doi: 10.1080/00405840802577544

Andrade, H., Wang, X., Du, Y., and Akawi, R. (2009). Rubric-referenced self-assessment and self-efficacy for writing. J. Educ. Res. 102, 287–302. doi: 10.3200/JOER.102.4.287-302

Andrade, H. L., and Brown, G. T. L. (2016). “Student self-assessment in the classroom,” in Handbook of Human and Social Conditions in Assessment , eds G. T. L. Brown and L. R. Harris (New York, NY: Routledge), 319–334.

PubMed Abstract | Google Scholar

Baars, M., Vink, S., van Gog, T., de Bruin, A., and Paas, F. (2014). Effects of training self-assessment and using assessment standards on retrospective and prospective monitoring of problem solving. Learn. Instruc. 33, 92–107. doi: 10.1016/j.learninstruc.2014.04.004

Balderas, I., and Cuamatzi, P. M. (2018). Self and peer correction to improve college students' writing skills. Profile. 20, 179–194. doi: 10.15446/profile.v20n2.67095

Bandura, A. (1997). Self-efficacy: The Exercise of Control . New York, NY: Freeman.

Barney, S., Khurum, M., Petersen, K., Unterkalmsteiner, M., and Jabangwe, R. (2012). Improving students with rubric-based self-assessment and oral feedback. IEEE Transac. Educ. 55, 319–325. doi: 10.1109/TE.2011.2172981

Baxter, P., and Norman, G. (2011). Self-assessment or self deception? A lack of association between nursing students' self-assessment and performance. J. Adv. Nurs. 67, 2406–2413. doi: 10.1111/j.1365-2648.2011.05658.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Bennett, R. E. (2011). Formative assessment: a critical review. Assess. Educ. 18, 5–25. doi: 10.1080/0969594X.2010.513678

Birjandi, P., and Hadidi Tamjid, N. (2012). The role of self-, peer and teacher assessment in promoting Iranian EFL learners' writing performance. Assess. Evalu. High. Educ. 37, 513–533. doi: 10.1080/02602938.2010.549204

Bjork, R. A., Dunlosky, J., and Kornell, N. (2013). Self-regulated learning: beliefs, techniques, and illusions. Annu. Rev. Psychol. 64, 417–444. doi: 10.1146/annurev-psych-113011-143823

Black, P., Harrison, C., Lee, C., Marshall, B., and Wiliam, D. (2003). Assessment for Learning: Putting it into Practice . Berkshire: Open University Press.

Black, P., and Wiliam, D. (1998). Inside the black box: raising standards through classroom assessment. Phi Delta Kappan 80, 139–144; 146–148.

Blanch-Hartigan, D. (2011). Medical students' self-assessment of performance: results from three meta-analyses. Patient Educ. Counsel. 84, 3–9. doi: 10.1016/j.pec.2010.06.037

Bol, L., Hacker, D. J., Walck, C. C., and Nunnery, J. A. (2012). The effects of individual or group guidelines on the calibration accuracy and achievement of high school biology students. Contemp. Educ. Psychol. 37, 280–287. doi: 10.1016/j.cedpsych.2012.02.004

Boud, D. (1995a). Implementing Student Self-Assessment, 2nd Edn. Australian Capital Territory: Higher Education Research and Development Society of Australasia.

Boud, D. (1995b). Enhancing Learning Through Self-Assessment. London: Kogan Page.

Boud, D. (1999). Avoiding the traps: Seeking good practice in the use of self-assessment and reflection in professional courses. Soc. Work Educ. 18, 121–132. doi: 10.1080/02615479911220131

Boud, D., and Brew, A. (1995). Developing a typology for learner self-assessment practices. Res. Dev. High. Educ. 18, 130–135.

Bourke, R. (2014). Self-assessment in professional programmes within tertiary institutions. Teach. High. Educ. 19, 908–918. doi: 10.1080/13562517.2014.934353

Bourke, R. (2016). Liberating the learner through self-assessment. Cambridge J. Educ. 46, 97–111. doi: 10.1080/0305764X.2015.1015963

Brown, G., Andrade, H., and Chen, F. (2015). Accuracy in student self-assessment: directions and cautions for research. Assess. Educ. 22, 444–457. doi: 10.1080/0969594X.2014.996523

Brown, G. T., and Harris, L. R. (2013). “Student self-assessment,” in Sage Handbook of Research on Classroom Assessment , ed J. H. McMillan (Los Angeles, CA: Sage), 367–393. doi: 10.4135/9781452218649.n21

Brown, G. T. L., and Harris, L. R. (2014). The future of self-assessment in classroom practice: reframing self-assessment as a core competency. Frontline Learn. Res. 3, 22–30. doi: 10.14786/flr.v2i1.24

Butler, D. L., and Winne, P. H. (1995). Feedback and self-regulated learning: a theoretical synthesis. Rev. Educ. Res. 65, 245–281. doi: 10.3102/00346543065003245

Butler, Y. G. (2018). “Young learners' processes and rationales for responding to self-assessment items: cases for generic can-do and five-point Likert-type formats,” in Useful Assessment and Evaluation in Language Education , eds J. Davis et al. (Washington, DC: Georgetown University Press), 21–39. doi: 10.2307/j.ctvvngrq.5

CrossRef Full Text

Chang, C.-C., Liang, C., and Chen, Y.-H. (2013). Is learner self-assessment reliable and valid in a Web-based portfolio environment for high school students? Comput. Educ. 60, 325–334. doi: 10.1016/j.compedu.2012.05.012

Chang, C.-C., Tseng, K.-H., and Lou, S.-J. (2012). A comparative analysis of the consistency and difference among teacher-assessment, student self-assessment and peer-assessment in a Web-based portfolio assessment environment for high school students. Comput. Educ. 58, 303–320. doi: 10.1016/j.compedu.2011.08.005

Colliver, J., Verhulst, S, and Barrows, H. (2005). Self-assessment in medical practice: a further concern about the conventional research paradigm. Teach. Learn. Med. 17, 200–201. doi: 10.1207/s15328015tlm1703_1

Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Rev. Educ. Res. 58, 438–481. doi: 10.3102/00346543058004438

de Bruin, A. B. H., and van Gog, T. (2012). Improving self-monitoring and self-regulation: From cognitive psychology to the classroom , Learn. Instruct. 22, 245–252. doi: 10.1016/j.learninstruc.2012.01.003

De Grez, L., Valcke, M., and Roozen, I. (2012). How effective are self- and peer assessment of oral presentation skills compared with teachers' assessments? Active Learn. High. Educ. 13, 129–142. doi: 10.1177/1469787412441284

Dolosic, H. (2018). An examination of self-assessment and interconnected facets of second language reading. Read. Foreign Langu. 30, 189–208.

Draper, S. W. (2009). What are learners actually regulating when given feedback? Br. J. Educ. Technol. 40, 306–315. doi: 10.1111/j.1467-8535.2008.00930.x

Dunlosky, J., and Ariel, R. (2011). “Self-regulated learning and the allocation of study time,” in Psychology of Learning and Motivation , Vol. 54 ed B. Ross (Cambridge, MA: Academic Press), 103–140. doi: 10.1016/B978-0-12-385527-5.00004-8

Dunlosky, J., and Rawson, K. A. (2012). Overconfidence produces underachievement: inaccurate self evaluations undermine students' learning and retention. Learn. Instr. 22, 271–280. doi: 10.1016/j.learninstruc.2011.08.003

Dweck, C. (2006). Mindset: The New Psychology of Success. New York, NY: Random House.

Epstein, R. M., Siegel, D. J., and Silberman, J. (2008). Self-monitoring in clinical practice: a challenge for medical educators. J. Contin. Educ. Health Prof. 28, 5–13. doi: 10.1002/chp.149

Eva, K. W., and Regehr, G. (2008). “I'll never play professional football” and other fallacies of self-assessment. J. Contin. Educ. Health Prof. 28, 14–19. doi: 10.1002/chp.150

Falchikov, N. (2005). Improving Assessment Through Student Involvement: Practical Solutions for Aiding Learning in Higher and Further Education . London: Routledge Falmer.

Fastre, G. M. J., van der Klink, M. R., Sluijsmans, D., and van Merrienboer, J. J. G. (2012). Drawing students' attention to relevant assessment criteria: effects on self-assessment skills and performance. J. Voc. Educ. Train. 64, 185–198. doi: 10.1080/13636820.2011.630537

Fastre, G. M. J., van der Klink, M. R., and van Merrienboer, J. J. G. (2010). The effects of performance-based assessment criteria on student performance and self-assessment skills. Adv. Health Sci. Educ. 15, 517–532. doi: 10.1007/s10459-009-9215-x

Fitzpatrick, B., and Schulz, H. (2016). “Teaching young students to self-assess critically,” Paper presented at the Annual Meeting of the American Educational Research Association (Washington, DC).

Franken, A. S. (1992). I'm Good Enough, I'm Smart Enough, and Doggone it, People Like Me! Daily affirmations by Stuart Smalley. New York, NY: Dell.

Glaser, C., and Brunstein, J. C. (2007). Improving fourth-grade students' composition skills: effects of strategy instruction and self-regulation procedures. J. Educ. Psychol. 99, 297–310. doi: 10.1037/0022-0663.99.2.297

Gonida, E. N., and Leondari, A. (2011). Patterns of motivation among adolescents with biased and accurate self-efficacy beliefs. Int. J. Educ. Res. 50, 209–220. doi: 10.1016/j.ijer.2011.08.002

Graham, S., Hebert, M., and Harris, K. R. (2015). Formative assessment and writing. Elem. Sch. J. 115, 523–547. doi: 10.1086/681947

Guillory, J. J., and Blankson, A. N. (2017). Using recently acquired knowledge to self-assess understanding in the classroom. Sch. Teach. Learn. Psychol. 3, 77–89. doi: 10.1037/stl0000079

Hacker, D. J., Bol, L., Horgan, D. D., and Rakow, E. A. (2000). Test prediction and performance in a classroom context. J. Educ. Psychol. 92, 160–170. doi: 10.1037/0022-0663.92.1.160

Harding, J. L., and Hbaci, I. (2015). Evaluating pre-service teachers math teaching experience from different perspectives. Univ. J. Educ. Res. 3, 382–389. doi: 10.13189/ujer.2015.030605

Harris, K. R., Graham, S., Mason, L. H., and Friedlander, B. (2008). Powerful Writing Strategies for All Students . Baltimore, MD: Brookes.

Harris, L. R., and Brown, G. T. L. (2013). Opportunities and obstacles to consider when using peer- and self-assessment to improve student learning: case studies into teachers' implementation. Teach. Teach. Educ. 36, 101–111. doi: 10.1016/j.tate.2013.07.008

Hattie, J., and Timperley, H. (2007). The power of feedback. Rev. Educ. Res. 77, 81–112. doi: 10.3102/003465430298487

Hawe, E., and Parr, J. (2014). Assessment for learning in the writing classroom: an incomplete realization. Curr. J. 25, 210–237. doi: 10.1080/09585176.2013.862172

Hawkins, S. C., Osborne, A., Schofield, S. J., Pournaras, D. J., and Chester, J. F. (2012). Improving the accuracy of self-assessment of practical clinical skills using video feedback: the importance of including benchmarks. Med. Teach. 34, 279–284. doi: 10.3109/0142159X.2012.658897

Huang, Y., and Gui, M. (2015). Articulating teachers' expectations afore: Impact of rubrics on Chinese EFL learners' self-assessment and speaking ability. J. Educ. Train. Stud. 3, 126–132. doi: 10.11114/jets.v3i3.753

Kaderavek, J. N., Gillam, R. B., Ukrainetz, T. A., Justice, L. M., and Eisenberg, S. N. (2004). School-age children's self-assessment of oral narrative production. Commun. Disord. Q. 26, 37–48. doi: 10.1177/15257401040260010401

Karnilowicz, W. (2012). A comparison of self-assessment and tutor assessment of undergraduate psychology students. Soc. Behav. Person. 40, 591–604. doi: 10.2224/sbp.2012.40.4.591

Kevereski, L. (2017). (Self) evaluation of knowledge in students' population in higher education in Macedonia. Res. Pedag. 7, 69–75. doi: 10.17810/2015.49

Kingston, N. M., and Nash, B. (2011). Formative assessment: a meta-analysis and a call for research. Educ. Meas. 30, 28–37. doi: 10.1111/j.1745-3992.2011.00220.x

Kitsantas, A., and Zimmerman, B. J. (2006). Enhancing self-regulation of practice: the influence of graphing and self-evaluative standards. Metacogn. Learn. 1, 201–212. doi: 10.1007/s11409-006-9000-7

Kluger, A. N., and DeNisi, A. (1996). The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychol. Bull. 119, 254–284. doi: 10.1037/0033-2909.119.2.254

Kollar, I., Fischer, F., and Hesse, F. (2006). Collaboration scripts: a conceptual analysis. Educ. Psychol. Rev. 18, 159–185. doi: 10.1007/s10648-006-9007-2

Kolovelonis, A., Goudas, M., and Dermitzaki, I. (2012). Students' performance calibration in a basketball dribbling task in elementary physical education. Int. Electron. J. Elem. Educ. 4, 507–517.

Koriat, A. (2012). The relationships between monitoring, regulation and performance. Learn. Instru. 22, 296–298. doi: 10.1016/j.learninstruc.2012.01.002

Kostons, D., van Gog, T., and Paas, F. (2012). Training self-assessment and task-selection skills: a cognitive approach to improving self-regulated learning. Learn. Instruc. 22, 121–132. doi: 10.1016/j.learninstruc.2011.08.004

Labuhn, A. S., Zimmerman, B. J., and Hasselhorn, M. (2010). Enhancing students' self-regulation and mathematics performance: the influence of feedback and self-evaluative standards Metacogn. Learn. 5, 173–194. doi: 10.1007/s11409-010-9056-2

Leach, L. (2012). Optional self-assessment: some tensions and dilemmas. Assess. Evalu. High. Educ. 37, 137–147. doi: 10.1080/02602938.2010.515013

Lew, M. D. N., Alwis, W. A. M., and Schmidt, H. G. (2010). Accuracy of students' self-assessment and their beliefs about its utility. Assess. Evalu. High. Educ. 35, 135–156. doi: 10.1080/02602930802687737

Lin-Siegler, X., Shaenfield, D., and Elder, A. D. (2015). Contrasting case instruction can improve self-assessment of writing. Educ. Technol. Res. Dev. 63, 517–537. doi: 10.1007/s11423-015-9390-9

Lipnevich, A. A., Berg, D. A. G., and Smith, J. K. (2016). “Toward a model of student response to feedback,” in The Handbook of Human and Social Conditions in Assessment , eds G. T. L. Brown and L. R. Harris (New York, NY: Routledge), 169–185.

Lopez, R., and Kossack, S. (2007). Effects of recurring use of self-assessment in university courses. Int. J. Learn. 14, 203–216. doi: 10.18848/1447-9494/CGP/v14i04/45277

Lopez-Pastor, V. M., Fernandez-Balboa, J.-M., Santos Pastor, M. L., and Aranda, A. F. (2012). Students' self-grading, professor's grading and negotiated final grading at three university programmes: analysis of reliability and grade difference ranges and tendencies. Assess. Evalu. High. Educ. 37, 453–464. doi: 10.1080/02602938.2010.545868

Lui, A. (2017). Validity of the responses to feedback survey: operationalizing and measuring students' cognitive and affective responses to teachers' feedback (Doctoral dissertation). University at Albany—SUNY: Albany NY.

Marks, M. B., Haug, J. C., and Hu, H. (2018). Investigating undergraduate business internships: do supervisor and self-evaluations differ? J. Educ. Bus. 93, 33–45. doi: 10.1080/08832323.2017.1414025

Memis, E. K., and Seven, S. (2015). Effects of an SWH approach and self-evaluation on sixth grade students' learning and retention of an electricity unit. Int. J. Prog. Educ. 11, 32–49.

Metcalfe, J., and Kornell, N. (2005). A region of proximal learning model of study time allocation. J. Mem. Langu. 52, 463–477. doi: 10.1016/j.jml.2004.12.001

Meusen-Beekman, K. D., Joosten-ten Brinke, D., and Boshuizen, H. P. A. (2016). Effects of formative assessments to develop self-regulation among sixth grade students: results from a randomized controlled intervention. Stud. Educ. Evalu. 51, 126–136. doi: 10.1016/j.stueduc.2016.10.008

Micán, D. A., and Medina, C. L. (2017). Boosting vocabulary learning through self-assessment in an English language teaching context. Assess. Evalu. High. Educ. 42, 398–414. doi: 10.1080/02602938.2015.1118433

Miller, T. M., and Geraci, L. (2011). Training metacognition in the classroom: the influence of incentives and feedback on exam predictions. Metacogn. Learn. 6, 303–314. doi: 10.1007/s11409-011-9083-7

Murakami, C., Valvona, C., and Broudy, D. (2012). Turning apathy into activeness in oral communication classes: regular self- and peer-assessment in a TBLT programme. System 40, 407–420. doi: 10.1016/j.system.2012.07.003

Nagel, M., and Lindsey, B. (2018). The use of classroom clickers to support improved self-assessment in introductory chemistry. J. College Sci. Teach. 47, 72–79.

Ndoye, A. (2017). Peer/self-assessment and student learning. Int. J. Teach. Learn. High. Educ. 29, 255–269.

Nguyen, T., and Foster, K. A. (2018). Research note—multiple time point course evaluation and student learning outcomes in an MSW course. J. Soc. Work Educ. 54, 715–723. doi: 10.1080/10437797.2018.1474151

Nicol, D., and Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud. High. Educ. 31, 199–218. doi: 10.1080/03075070600572090

Nielsen, K. (2014), Self-assessment methods in writing instruction: a conceptual framework, successful practices and essential strategies. J. Res. Read. 37, 1–16. doi: 10.1111/j.1467-9817.2012.01533.x.

Nowell, C., and Alston, R. M. (2007). I thought I got an A! Overconfidence across the economics curriculum. J. Econ. Educ. 38, 131–142. doi: 10.3200/JECE.38.2.131-142

Nugteren, M. L., Jarodzka, H., Kester, L., and Van Merriënboer, J. J. G. (2018). Self-regulation of secondary school students: self-assessments are inaccurate and insufficiently used for learning-task selection. Instruc. Sci. 46, 357–381. doi: 10.1007/s11251-018-9448-2

Panadero, E., and Alonso-Tapia, J. (2013). Self-assessment: theoretical and practical connotations. When it happens, how is it acquired and what to do to develop it in our students. Electron. J. Res. Educ. Psychol. 11, 551–576. doi: 10.14204/ejrep.30.12200

Panadero, E., Alonso-Tapia, J., and Huertas, J. A. (2012). Rubrics and self-assessment scripts effects on self-regulation, learning and self-efficacy in secondary education. Learn. Individ. Differ. 22, 806–813. doi: 10.1016/j.lindif.2012.04.007

Panadero, E., Alonso-Tapia, J., and Huertas, J. A. (2014). Rubrics vs. self-assessment scripts: effects on first year university students' self-regulation and performance. J. Study Educ. Dev. 3, 149–183. doi: 10.1080/02103702.2014.881655

Panadero, E., Alonso-Tapia, J., and Reche, E. (2013). Rubrics vs. self-assessment scripts effect on self-regulation, performance and self-efficacy in pre-service teachers. Stud. Educ. Evalu. 39, 125–132. doi: 10.1016/j.stueduc.2013.04.001

Panadero, E., Brown, G. L., and Strijbos, J.-W. (2016a). The future of student self-assessment: a review of known unknowns and potential directions. Educ. Psychol. Rev. 28, 803–830. doi: 10.1007/s10648-015-9350-2

Panadero, E., Jonsson, A., and Botella, J. (2017). Effects of self-assessment on self-regulated learning and self-efficacy: four meta-analyses. Educ. Res. Rev. 22, 74–98. doi: 10.1016/j.edurev.2017.08.004

Panadero, E., Jonsson, A., and Strijbos, J. W. (2016b). “Scaffolding self-regulated learning through self-assessment and peer assessment: guidelines for classroom implementation,” in Assessment for Learning: Meeting the Challenge of Implementation , eds D. Laveault and L. Allal (New York, NY: Springer), 311–326. doi: 10.1007/978-3-319-39211-0_18

Panadero, E., and Romero, M. (2014). To rubric or not to rubric? The effects of self-assessment on self-regulation, performance and self-efficacy. Assess. Educ. 21, 133–148. doi: 10.1080/0969594X.2013.877872

Papanthymou, A., and Darra, M. (2018). Student self-assessment in higher education: The international experience and the Greek example. World J. Educ. 8, 130–146. doi: 10.5430/wje.v8n6p130

Punhagui, G. C., and de Souza, N. A. (2013). Self-regulation in the learning process: actions through self-assessment activities with Brazilian students. Int. Educ. Stud. 6, 47–62. doi: 10.5539/ies.v6n10p47

Raaijmakers, S. F., Baars, M., Paas, F., van Merriënboer, J. J. G., and van Gog, T. (2019). Metacognition and Learning , 1–22. doi: 10.1007/s11409-019-09189-5

Raaijmakers, S. F., Baars, M., Schapp, L., Paas, F., van Merrienboer, J., and van Gog, T. (2017). Training self-regulated learning with video modeling examples: do task-selection skills transfer? Instr. Sci. 46, 273–290. doi: 10.1007/s11251-017-9434-0

Ratminingsih, N. M., Marhaeni, A. A. I. N., and Vigayanti, L. P. D. (2018). Self-assessment: the effect on students' independence and writing competence. Int. J. Instruc. 11, 277–290. doi: 10.12973/iji.2018.11320a

Ross, J. A., Rolheiser, C., and Hogaboam-Gray, A. (1998). “Impact of self-evaluation training on mathematics achievement in a cooperative learning environment,” Paper presented at the annual meeting of the American Educational Research Association (San Diego, CA).

Ross, J. A., and Starling, M. (2008). Self-assessment in a technology-supported environment: the case of grade 9 geography. Assess. Educ. 15, 183–199. doi: 10.1080/09695940802164218

Samaie, M., Nejad, A. M., and Qaracholloo, M. (2018). An inquiry into the efficiency of whatsapp for self- and peer-assessments of oral language proficiency. Br. J. Educ. Technol. 49, 111–126. doi: 10.1111/bjet.12519

Sanchez, C. E., Atkinson, K. M., Koenka, A. C., Moshontz, H., and Cooper, H. (2017). Self-grading and peer-grading for formative and summative assessments in 3rd through 12th grade classrooms: a meta-analysis. J. Educ. Psychol. 109, 1049–1066. doi: 10.1037/edu0000190

Sargeant, J. (2008). Toward a common understanding of self-assessment. J. Contin. Educ. Health Prof. 28, 1–4. doi: 10.1002/chp.148

Sargeant, J., Mann, K., van der Vleuten, C., and Metsemakers, J. (2008). “Directed” self-assessment: practice and feedback within a social context. J. Contin. Educ. Health Prof. 28, 47–54. doi: 10.1002/chp.155

Shute, V. (2008). Focus on formative feedback. Rev. Educ. Res. 78, 153–189. doi: 10.3102/0034654307313795

Silver, I., Campbell, C., Marlow, B., and Sargeant, J. (2008). Self-assessment and continuing professional development: the Canadian perspective. J. Contin. Educ. Health Prof. 28, 25–31. doi: 10.1002/chp.152

Siow, L.-F. (2015). Students' perceptions on self- and peer-assessment in enhancing learning experience. Malaysian Online J. Educ. Sci. 3, 21–35.

Son, L. K., and Metcalfe, J. (2000). Metacognitive and control strategies in study-time allocation. J. Exp. Psychol. 26, 204–221. doi: 10.1037/0278-7393.26.1.204

Tan, K. (2004). Does student self-assessment empower or discipline students? Assess. Evalu. Higher Educ. 29, 651–662. doi: 10.1080/0260293042000227209

Tan, K. (2009). Meanings and practices of power in academics' conceptions of student self-assessment. Teach. High. Educ. 14, 361–373. doi: 10.1080/13562510903050111

Taras, M. (2008). Issues of power and equity in two models of self-assessment. Teach. High. Educ. 13, 81–92. doi: 10.1080/13562510701794076

Tejeiro, R. A., Gomez-Vallecillo, J. L., Romero, A. F., Pelegrina, M., Wallace, A., and Emberley, E. (2012). Summative self-assessment in higher education: implications of its counting towards the final mark. Electron. J. Res. Educ. Psychol. 10, 789–812.

Thawabieh, A. M. (2017). A comparison between students' self-assessment and teachers' assessment. J. Curri. Teach. 6, 14–20. doi: 10.5430/jct.v6n1p14

Tulgar, A. T. (2017). Selfie@ssessment as an alternative form of self-assessment at undergraduate level in higher education. J. Langu. Linguis. Stud. 13, 321–335.

van Helvoort, A. A. J. (2012). How adult students in information studies use a scoring rubric for the development of their information literacy skills. J. Acad. Librarian. 38, 165–171. doi: 10.1016/j.acalib.2012.03.016

van Loon, M. H., de Bruin, A. B. H., van Gog, T., van Merriënboer, J. J. G., and Dunlosky, J. (2014). Can students evaluate their understanding of cause-and-effect relations? The effects of diagram completion on monitoring accuracy. Acta Psychol. 151, 143–154. doi: 10.1016/j.actpsy.2014.06.007

van Reybroeck, M., Penneman, J., Vidick, C., and Galand, B. (2017). Progressive treatment and self-assessment: Effects on students' automatisation of grammatical spelling and self-efficacy beliefs. Read. Writing 30, 1965–1985. doi: 10.1007/s11145-017-9761-1

Wang, W. (2017). Using rubrics in student self-assessment: student perceptions in the English as a foreign language writing context. Assess. Evalu. High. Educ. 42, 1280–1292. doi: 10.1080/02602938.2016.1261993

Wollenschläger, M., Hattie, J., Machts, N., Möller, J., and Harms, U. (2016). What makes rubrics effective in teacher-feedback? Transparency of learning goals is not enough. Contemp. Educ. Psychol. 44–45, 1–11. doi: 10.1016/j.cedpsych.2015.11.003

Yan, Z., and Brown, G. T. L. (2017). A cyclical self-assessment process: towards a model of how students engage in self-assessment. Assess. Evalu. High. Educ. 42, 1247–1262. doi: 10.1080/02602938.2016.1260091

Yilmaz, F. N. (2017). Reliability of scores obtained from self-, peer-, and teacher-assessments on teaching materials prepared by teacher candidates. Educ. Sci. 17, 395–409. doi: 10.12738/estp.2017.2.0098

Zimmerman, B. J. (2000). Self-efficacy: an essential motive to learn. Contemp. Educ. Psychol. 25, 82–91. doi: 10.1006/ceps.1999.1016

Zimmerman, B. J., and Schunk, D. H. (2011). “Self-regulated learning and performance: an introduction and overview,” in Handbook of Self-Regulation of Learning and Performance , eds B. J. Zimmerman and D. H. Schunk (New York, NY: Routledge), 1–14.

Keywords: self-assessment, self-evaluation, self-grading, formative assessment, classroom assessment, self-regulated learning (SRL)

Citation: Andrade HL (2019) A Critical Review of Research on Student Self-Assessment. Front. Educ. 4:87. doi: 10.3389/feduc.2019.00087

Received: 27 April 2019; Accepted: 02 August 2019; Published: 27 August 2019.

Reviewed by:

Copyright © 2019 Andrade. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Heidi L. Andrade, handrade@albany.edu

This article is part of the Research Topic

Advances in Classroom Assessment Theory and Practice

research paper writing assessment

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Writing Assessment

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Writing Program Administration Follow Following
  • Teaching Composition Follow Following
  • Composition Studies Follow Following
  • Composition Pedagogies Follow Following
  • Comparative Ethnic Studies Follow Following
  • The First Year Student Experience Follow Following
  • Composition and Rhetoric Follow Following
  • Self Assessment in Language Teaching Follow Following
  • E-Portfolios Follow Following
  • Peer Assessment Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Jump to navigation

  • Inside Writing
  • Teacher's Guides
  • Student Models
  • Writing Topics
  • Minilessons
  • Shopping Cart
  • Inside Grammar
  • Grammar Adventures
  • CCSS Correlations
  • Infographics

Writing Assessments

research paper writing assessment

Writing Assessment

When you want students to understand how writing is graded, turn to our vast selection of assessment examples. You'll find elementary and middle school models in all of the major modes of writing, along with rubrics that assess each example as "Strong," "Good," "Okay," or "Poor." You can also download blank rubrics to use with your own students.

Discover more writing assessment tools for Grade 2 , Grade 3 , Grades 4-5 , Grades 6-8 , Grades 9-10 , and Grades 11-12 —including writing on tests and responding to prompts.

Jump to . . .

  • Grades 9-10
  • Grades 11-12

Response to Literature

  • Julius the Baby of the World Book Review Strong One Great Book Book Review Good Dear Mr. Marc Brown Book Review Okay Snowflake Bentley Book Review Poor

Explanatory Writing

  • 4th of July Traditions Explanatory Essay Strong Happy Halloween Explanatory Essay Good Turkey Day Explanatory Essay Okay Forth of July Explanatory Essay Poor

Persuasive Writing

  • Get a Dog Persuasive Paragraph Strong Please Be Kind Persuasive Paragraph Good Let Me stay up with Shane Persuasive Paragraph Okay We need Bedder Chips Persuasive Paragraph Poor

Narrative Writing

  • The Horrible Day Personal Narrative Strong Keeping the Dressing Personal Narrative Good Friday Personal Narrative Okay Dee Dees hose Personal Narrative Poor
  • Mildew Houses Description Strong Grandpa's Face Description Good A Cool Restrant Description Okay Hot Dogs Description Poor
  • How to Bake a Cake How-To Strong Make a Blow-Up Box How-To Good How to Feed a Dog How-To Okay A Kite How-To Poor
  • Recycling Jars and Cans How-To Strong Getting to the Park How-To Good Planting a Garden How-To Okay How to Pull a tooth How-To Poor
  • Zev's Deli Description Strong Our Horses Description Good My Favorit Lake Description Okay The Zoo Description Poor
  • Thunderstorm! Narrative Paragraph Strong My Trip to the Zoo Narrative Paragraph Good My Lost Puppy Narrative Paragraph Okay My Trip Narrative Paragraph Poor
  • The Sled Run Personal Narrative Strong The Funny Dance Personal Narrative Good Texas Personal Narrative Okay A Sad Day Personal Narrative Poor
  • No School Persuasive Paragraph Strong Dogs Stay Home! Persuasive Paragraph Good A New Pool Persuasive Paragraph Okay A Bigger Cafaterea Persuasive Paragraph Poor
  • New Sidewalks Persuasive Paragraph Strong Don't Burn Leaves Persuasive Paragraph Good The Ginkgo Trees Persuasive Paragraph Okay Turn Your Lights Off Persuasive Paragraph Poor
  • The Year Mom Won the Pennant Book Review Strong A Story of Survival Book Review Good Keep Reading! Book Review Okay Homecoming Book Review Poor
  • A Shipwreck at the Bottom of the World Book Review Strong A Letter of Recommendation Book Review Good Falling Up Book Review Okay The Cat Ate My Gymsuit Book Review Poor

Research Writing

  • The Snow Leopard Report Strong The Great Pyramid of Giza Report Good Koalas Report Okay Ladybugs Report Poor
  • The Platypus Report Strong The Click Beetle Report Good Martin Luther King, Junior’s Dream Report Okay Crickets and Grasshoppers Report Poor
  • Talent Show and Tell Persuasive Essay Strong Art Every Day Persuasive Essay Good More Recess, Please Persuasive Essay Okay Let Us Eat Persuasive Essay Poor
  • Help Save Our Manatees Persuasive Essay Strong A Fictional Letter to President Lincoln Persuasive Essay Good Endangered Animals Persuasive Essay Okay Why Smog Is Bad Persuasive Essay Poor
  • Food from the Ocean Explanatory Essay Strong How to Make a S’More Explanatory Essay Good The Person I Want to Be Explanatory Essay Okay Sleepover Explanatory Essay Poor
  • Something You Can Sink Your Teeth Into Explanatory Essay Strong Bathing a Puppy Explanatory Essay Good Trading Places Explanatory Essay Okay Fluffy Explanatory Essay Poor
  • When I Got Burned on Dad’s Motorcycle Personal Narrative Strong My First Home Run Personal Narrative Good My Worst Scrape Personal Narrative Okay The Trip to the Woods Personal Narrative Poor
  • Soggy Roads Personal Narrative Strong The Broken Statue Personal Narrative Good Space Monster Personal Narrative Okay Las Vegas Personal Narrative Poor
  • Departure Personal Narrative Strong A January Surprise Personal Narrative Good A Day I'll Never Forget Personal Narrative Okay My Summer in Jacksonville, Florida Personal Narrative Poor
  • Puppy Personal Narrative Strong A New Friend Personal Narrative Good My Summer in Michigan Personal Narrative Okay A Horrible Day Personal Narrative Poor
  • Dear Mr. Rhys Biography Strong Turning 13 Biography Good My Resident Edith Biography Okay Police Officer Biography Poor
  • Iron Summary (Strong) Summary Strong Iron Summary (Good) Summary Good Iron Summary (Okay) Summary Okay Iron Summary (Poor) Summary Poor
  • Paper Recycling Explanatory Essay Strong Letter to France Explanatory Essay Good I Have a Dream . . . Too Explanatory Essay Okay Fire Fighter Explanatory Essay Poor
  • Mount Rushmore’s Famous Faces Explanatory Essay Strong Youth Movements in Nazi Germany Explanatory Essay Good My Personal Values Explanatory Essay Okay The Influence of Gangs in Our Community Explanatory Essay Poor
  • Malcolm X and Eleanor Roosevelt Comparison-Contrast Strong How to Make Tabouli Comparison-Contrast Good Yo-Yo’s Flood Del Mar Hills School Comparison-Contrast Okay The Gail Woodpecker Comparison-Contrast Poor
  • Railroad to Freedom Book Review Strong If I Were Anne Frank Book Review Good To Kill a Mockingbird Book Review Okay Good Brother or Bad Brother? Book Review Poor
  • The Power of Water Book Review Strong Summary Review: Arranging a Marriage Book Review Good Freaky Friday Book Review Okay No Friend of Mine Book Review Poor
  • The Aloha State Research Report Strong Tornadoes Research Report Good Earthquakes Research Report Okay The Bombing of Peal Harbor Research Report Poor
  • Wilma Mankiller: Good Times and Bad Research Report Strong Green Anaconda Research Report Good The Great Pyramid Research Report Okay Poodles Research Report Poor

Business Writing

  • Using Hydrochloric Acid (Strong) Instructions Strong Using Hydrochloric Acid (Good) Instructions Good Using Hydrochloric Acid (Okay) Instructions Okay Using Hydrochloric Acid (Poor) Instructions Poor
  • Dear Dr. Larson (Strong) Persuasive Letter Strong Dear Dr. Larson (Good) Persuasive Letter Good Dear Dr. Larson (Okay) Persuasive Letter Okay Dear Dr. Larson (Poor) Persuasive Letter Poor
  • Smoking in Restaurants Persuasive Essay Strong Letter to the Editor (Arts) Persuasive Essay Good Toilet-to-Tap Water Persuasive Essay Okay The Unperminent Hair Dye Rule Persuasive Essay Poor
  • Capital Punishment Is Wrong! Persuasive Essay Strong Letter to the Editor (Cheating) Persuasive Essay Good Letter to the Editor (Immigration) Persuasive Essay Okay Judge Not Persuasive Essay Poor
  • Revisiting Seneca Falls Research Report Strong The Importance of Cinco de Mayo Research Report Good The Meaning of Juneteenth Research Report Okay Russian Missile Problem Research Report Poor
  • Dear Ms. Holloway Business Letter Strong Dear Mr. McNulty Business Letter Good Dear Mr. Underwood Business Letter Okay Dear Mrs. Jay Business Letter Poor
  • Scout Takes Another Look Literary Analysis Strong Rocket Boys: A Memoir Literary Analysis Good A Wrinkle in Time Literary Analysis Okay Being True to Yourself: The Call of the Wild Literary Analysis Poor
  • Evening the Odds Argument Essay Strong Lack of Respect a Growing Problem Argument Essay Good The Right to Dress Argument Essay Okay Grading Students on Effort Argument Essay Poor
  • Sinking the Unsinkable Explanatory Essay Strong The Best Preventive Medicine Explanatory Essay Good The Ozone Layer Explanatory Essay Okay Measurement Explanatory Essay Poor
  • Isn't It Romantic? Definition Strong Good and Angry Definition Good Unsung Heroes Definition Okay Love Definition Poor
  • People Power Personal Narrative Strong It's a Boy Personal Narrative Good A Senior Moment Personal Narrative Okay A Big Family Wedding Personal Narrative Poor
  • Understanding Hmong Americans MLA Research Paper Strong Hmong: From Allies to Neighbors MLA Research Paper Good Welcome the Hmong to America MLA Research Paper Okay Hmong People MLA Research Paper Poor

Narrative Writing, Creative Writing

  • Putin Meddles in U.S. Casseroles Satirical News Story Strong Cabinet Secretaries Now Cabinet Office Assistants Satirical News Story Good Area Man Teaches Ways to Check for B.O. Satirical News Story Okay Global Warming Is Weather-Dependent Satirical News Story Poor
  • Poverty and Race as Predictors in the 2016 Presidential Election Statistical Analysis Strong Poverty and Race in the 2016 Election Statistical Analysis Good AP Stats Project Statistical Analysis Okay Stats Analysis Statistical Analysis Poor
  • Renewable and Carbon-Neutral Problem-Solution Strong The Ethanol Revolution Problem-Solution Good Growing Energy Problem-Solution Okay Corn for the Future Problem-Solution Poor
  • Generations of America Speech Strong Inauguration Speech of the 49th U.S. President Speech Good The Greatest Inauguration Speech Speech Okay What I Will Do for This Country Speech Poor
  • True Leadership Personal Essay Strong A Thing of Beauty Personal Essay Good How I Will Contribute to College Personal Essay Okay What Education Means Personal Essay Poor
  • Setting in Crane and O'Connor Literary Analysis Strong The Scouring of the Shire Literary Analysis Good Setting in Calvin and Hobbes Literary Analysis Okay On Golden Pond Literary Analysis Poor
  • Jane Eyre and the Perils of Sacrifice Literary Analysis Strong Mrs. Reed as a Tragic Figure Literary Analysis Good A Lack of Love Literary Analysis Okay Bad Choices, Bad Results Literary Analysis Poor

This paper is in the following e-collection/theme issue:

Published on 1.4.2024 in Vol 26 (2024)

The Challenges in Using eHealth Decision Resources for Surrogate Decision-Making in the Intensive Care Unit

Authors of this article:

Author Orcid Image

  • Wan-Na Sun 1 , MSN   ; 
  • Chi-Yin Kao 2 , PhD  

1 Department of Nursing, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan

2 Department of Nursing, College of Medicine, National Cheng Kung University, Tainan, Taiwan

Corresponding Author:

Chi-Yin Kao, PhD

Department of Nursing, College of Medicine

National Cheng Kung University

1 University Road

Tainan, 701

Phone: 886 6 2353535 ext 5843

Fax:886 6 2377550

Email: [email protected]

The mortality rate in intensive care units (ICUs) is notably high, with patients often relying on surrogates for critical medical decisions due to their compromised state. This paper provides a comprehensive overview of eHealth. The challenges of applying eHealth tools, including economic disparities and information inaccuracies are addressed. This study then introduces eHealth literacy and the assessment tools to evaluate users’ capability and literacy levels in using eHealth resources. A clinical scenario involving surrogate decision-making is presented. This simulated case involves a patient with a hemorrhagic stroke who has lost consciousness and requires medical procedures such as tracheostomy. However, due to the medical surrogate’s lack of familiarity with eHealth devices and limited literacy in using eHealth resources, difficulties arise in assisting the patient in making medical decisions. This scenario highlights challenges related to eHealth literacy and solution strategies are proposed. In conclusion, effective ICU decision-making with eHealth tools requires a careful balance between efficiency with inclusivity. Tailoring communication strategies and providing diverse materials are essential for effective eHealth decision resources in the ICU setting. Health professionals should adopt a patient-centered approach to enhance the decision-making experience, particularly for individuals with limited eHealth literacy.

Introduction

Decision-making by patient surrogates.

The mortality rate of patients in intensive care units (ICUs) exceeds that of those in the general wards. The Society of Critical Care Medicine reported a mortality rate ranging from 10% to 29% in adult ICUs [ 1 ]. In Taiwan, the mortality rate in ICUs is 11.6% [ 2 ]. The patients in ICUs, grappling with impaired cardiopulmonary functions and severe illness, are in critical condition and are typically physically vulnerable or even in an unconscious state [ 3 ]. They depend on the health care team for basic daily activities such as eating and toileting. In addition, given their critical condition, a variety of complex medical treatments such as endotracheal intubation and ventilator use, are often required to maintain their vital signs [ 4 ]. Coupled with the administration of multiple medications, patients often find it challenging to express their preferences. If patients have not previously provided an advance directive, they must rely on surrogates for medical decisions. Statistics indicate that 95% of surrogates make at least one major medical decision upon the patient’s ICU admission [ 4 ]. Another study highlights that surrogates are involved in up to 71% of medical decisions within the first 48 hours of the patient’s ICU stay [ 5 , 6 ], underscoring the prevalence of surrogates in guiding medical choice decisions for critically ill patients. However, the decision-making process places significant pressure on surrogates, forcing them to reach medical decisions quickly in collaboration with health professionals [ 5 , 6 ]. This pressure can lead to conflicts in decision-making and potentially disrupt the harmony of the medical-patient relationship, all exacerbated by the psychological strain of the possible loss of a loved one and time constraints.

Miller et al [ 4 ] noted that 30% of surrogates experienced anxiety and depression, and up to 50% even developed posttraumatic stress disorder. In addition, surrogates possess varying levels of information and knowledge regarding medical treatment, which can influence their capacity to assess and participate in medical decision-making, consequently affecting their ability to comprehend information while communicating with health professionals and making decisions on behalf of the patients [ 7 ].

However, with the advancement of health care technology and the catalyzing effect of the pandemic, the use of eHealth tools to assist surrogates in decision-making has become a trend. However, it may pose several challenges. For instance, surrogates may require time to learn to use these eHealth tools effectively [ 8 , 9 ]. Health care professionals must provide effective training and support to ensure that surrogates can fully use these tools. During the use of eHealth tools, there should also be a mechanism for immediate clarification if any doubts arise [ 8 , 10 ]. Furthermore, despite the use of eHealth tools, face-to-face communication remains essential for surrogates. Through in-person observations, one can understand and clarify considerations related to medical interventions stemming from different cultures and family backgrounds [ 10 ]. This is vital to genuinely assist surrogates in making informed decisions.

Use of eHealth Tools in Decision-Making

The term “eHealth” has gained popularity. In 2016, the World Health Organization (WHO) introduced the term to encompass the latest advancements in medical technology, particularly the use of information and communication technology (ICT) to enhance the function and efficiency of the health care system. eHealth encompasses more than just internet medicine; it also includes telehealth and telemedicine. It is a dynamic field that is in a constant state of evolution, signifying not only technological progress but also shifts in how we envision improving the health care landscape as well [ 11 , 12 ].

Scholars regard eHealth as an emerging domain of this generation. It is considered a highly convenient, easily accessible, and user-friendly tool that captures the attention of users [ 11 , 12 ]. Among the most widely used eHealth services is the dissemination of health care information through means such as QR codes, videos, and websites [ 9 ]. Many developing countries are transitioning away from paper-based methods, integrating hospital data through ICT, replacing conventional patient care models, and using technology to alleviate the workload on nursing staff to reduce unnecessary costs for individuals and society and alleviate economic and medical burdens. Research has revealed that over 50% of users seek health- or disease-related information through eHealth while up to 60% make decisions based on the information they find [ 8 ]. Moreover, in high-income countries, nearly 80% of adults regularly use the internet to access health information [ 10 ].

Furthermore, by disseminating pertinent medical and disease-related information through internet platforms and other channels, health professionals can enhance health care knowledge among the public, patients, and their families. This fosters greater awareness of illnesses and facilitates immediate diagnosis and treatment for health concerns, which significantly augments health literacy and diminishes the number of patients facing serious illnesses [ 10 ]. In recent years, there has been extensive use of the patient decision aid tools developed using eHealth technology. Some scholars even highlight the potential for eHealth to enhance equitable medical care for remote areas and among patients with limited mobility [ 10 ].

Nevertheless, health professionals should carefully consider whether eHealth is suitable for all groups and diseases as a decision-making tool. This is an issue that warrants our attention, as indiscriminate use across all demographics might not be advisable.

Challenges of Applying eHealth Tools

The application of eHealth tools can present a range of challenges, including economic disparities, information inaccuracies, and the potential exclusion of specific groups. For instance, government agencies should formulate policies and ensure the provision of adequate equipment. Failure to do so may result in unequal access to eHealth tools, favoring higher socioeconomic groups while leaving lower socioeconomic groups without access. This misalignment with the principle of equitable health care is a concern, as some medical institutions may initially refuse to offer services for marginalized groups such as Black communities and low-income individuals [ 9 ]. Consequently, a comprehensive support system should be in place before the implementation of eHealth tools.

When conveying medical and disease-related information, it is crucial to consider the risk of patients’ privacy being compromised or the possibility of conveying inaccurate and misleading messages [ 9 ]. For instance, Ruppert et al [ 13 ] searched for medical information on the social platform YouTube, focusing on terms such as skin cancer and sunscreen. Their findings indicated that up to 40% of the messages contained misinformation and 20% of the messages were potentially misleading [ 13 ]. Swartzendruber et al [ 14 ] explored the Women’s Resource Center of China and found that up to 53% of the content provided incorrect information about breast cancer treatment, with 84% of the website content being dedicated to advertising programs.

These findings underscore the fact that, despite the eHealth revolution, there are still numerous unresolved issues that require attention. In addition to effectively harnessing technology for caregiving, careful planning is essential to address the potential pitfalls associated with technology. In this era, where eHealth is a prevailing trend, it is crucial to consider how we can make better and more user-friendly use of it. Therefore, there is a need for tools to assess the suitability of patients for using eHealth tools.

eHealth Literacy and Assessment Tools

As medical technology advances, an increasing number of countries are actively promoting national health literacy through ICT. However, the main purpose of using eHealth technologies is to empower users to make informed health decisions by finding, comprehending, and discerning relevant health information, and then applying this knowledge to address health-related concerns [ 15 ]. It is important to note that eHealth differs from traditional health literacy in that it entails using electronic resources to access, understand, and distinguish pertinent health information and then apply this knowledge to deal with health issues [ 15 , 16 ]. It is a dynamic process influenced by factors such as the user's background, knowledge, educational level, and health status, and it can lead to different motivations and strategies in various contexts [ 15 , 17 ].

However, if only a limited number of people participate in or use the eHealth tools, the goal of universalization for these tools is compromised; accordingly, it is important to assess users’ capability and literacy level to provide them with appropriate eHealth tools to enhance their health literacy.

Norman and Skinner [ 15 ] introduced the eHealth Literacy Lily Model ( Figure 1 ), which consists of 2 categories, analytic and context-specific. The model comprises 6 various components of literacy: traditional, information, media, health, computer, and scientific. The analytic components (traditional, information, and media) mainly evaluate the users’ fundamental skills and are not directly tied to the content of health literacy. In contrast, the context-specific components (health, computer, and scientific) are more focused on context-specific and computational skills. The following section elaborates on these 6 components of the eHealth Literacy Lily Model. Table 1 provides more details.

research paper writing assessment

Substantiating science-based online health messages in health literacy is a common approach, but it can be a significant challenge for individuals who lack adequate education. This is particularly true since users without these 6 core skills could face difficulties or obstacles when trying to adopt eHealth [ 9 , 15 ]. For this reason, assessing users’ health literacy has become an essential step before implementing eHealth, and the availability of quantitative data enables health professionals to provide evidence-based and useful eHealth tools for users.

Norman and Skinner [ 15 ] developed the eHealth Literacy Scale in English, although it has been subsequently translated into multiple languages including Amharic, traditional and simplified Chinese, Dutch, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Korean, Persian, Polish, Portuguese, Norwegian, and Swedish [ 18 ]. The questionnaire consists of 8 questions rated on a 5-point Likert scale, ranging from 1 for “Not useful at all” to 5 for “Very Useful.” A total score range is 0-40, with higher scores indicating a better ability to use eHealth. A score of ≤31 points indicates a lower level of eHealth literacy. Previous studies have reported a high internal consistency coefficient α of .88 [ 19 , 20 ]. Further details are provided in Table 2 .

a 1: not useful at all.

b 2: not useful.

c 3: unsure.

d 4: useful.

e 5: very useful.

Clinical Scenario

In this section, we present a simulated clinical scenario to address the challenges associated with applying an eHealth tool. We then use the eHealth Literacy Lily Model and the eHealth Literacy Scale to analyze the situation and propose potential solutions.

Yuliya, a 68-year-old woman, lives with her husband, Eric. Their 2 children are working abroad, and they communicate via internet phone calls. Yuliya does not have any underlying medical conditions and regularly attends government-provided health checkups, following the advice of her neighbors. Her primary sources of health information are her children and neighbors.

Eric, on the other hand, has a complex medical history, including diabetes, chronic kidney disease stage IV, and congestive heart failure stage 2 according to the New York Heart Association. In 2020, Yuliya returned home to find Eric unconscious. After Eric was sent to the emergency room, his cognitive alertness was unclear, EVM of Glasgow Coma Scale (GCS) was E1V2M3; he was then diagnosed with COVID-19 (cycle threshold value 19.5 / polymerase chain reaction showed positive) and acute respiratory failure with chest x-ray showing pneumonia and SpO 2 range of 77%-82%. The doctor performed tracheal intubation with a ventilator, and Eric was subsequently transferred to the ICU. The doctor informed Yuliya that Eric's consciousness had not recovered and that he needed a computed tomography (CT) scan. The doctor hoped that Yuliya could make an immediate decision regarding the CT scan after the explanation. However, Yuliya was terrified and did not understand the doctor's explanation; nevertheless, she consented to the CT scan. Sadly, Eric was diagnosed with a hemorrhagic stroke of basal ganglia, and even after 2 weeks, he was still unable to breathe spontaneously due to the stroke and he remained in an unconscious state. The doctor recommended a tracheostomy for Eric, proposing that Eric needed long-term care in the future. However, Yuliya was faced with a difficult decision regarding the procedure.

To facilitate Yuliya’s understanding, the nurse provided her with a QR code to access a video. The nurse emphasized that the video contained information about tracheostomy procedures and care. Yuliya was instructed to watch the video and decide whether Eric should undergo the tracheostomy. However, Yuliya encountered challenges as she was unfamiliar with how to use the QR code, and she had limited literacy skills, with her only means of communication being in Taiwanese. Additionally, the video was in Chinese, which further compounded her difficulties. This left her with numerous unanswered questions and uncertainties.

Due to the time difference, Yuliya could not communicate with her children abroad in a timely manner, so upon returning home, Yuliya sought assistance from her neighbors to watch the video, but unfortunately, her neighbors could not operate the QR code either. Instead, they shared their limited knowledge of tracheostomy surgery, drawing from soap opera scenarios. This misconception led Yuliya to believe that patients who had strokes must be bedridden and entirely dependent on others for care.

Meanwhile, the Health Bureau contacted her and requested her to use a social distancing app to monitor Eric’s condition. Yuliya informed the contact person that she did not know how to use mobile phone apps or browse the internet.

Analysis of the Problem

After evaluating this case using the eHealth Literacy Lily Model and the eHealth Literacy Scale, it became clear that Yulia faced challenges in 6 components of literacy ( Table 3 ). Additionally, Yuliya’s total score on the eHealth Literacy Scale was 8, indicating her limited ability to use eHealth resources effectively.

Solution Strategies for This Clinical Scenario

According to the analytic type.

For individuals with limited traditional, information, and media literacy skills, it is highly recommended to facilitate communication through family meetings. Family meetings can provide comprehensive opportunities for health professionals, patients, or decision makers to exchange information, explain treatments, and clarify questions [ 21 ]. Effective communication skills are crucial in this process, and health professionals should use listening skills, questioning skills, and effective message delivery to enhance understanding and facilitate shared decision-making.

Listening Skills

Health professionals should encourage patients or surrogates to express their thoughts and concerns. By creating a safe space for open dialogue, they can increase the patients’ or surrogates’ willingness to communicate and provide emotional support. Observing nonverbal cues and summarizing patient or surrogate expectations can help ensure effective communication.

Questioning Skills

The use of open-ended and closed-ended questions is essential to discuss and clarify the patient's preferences and concerns. This approach allows health professionals to gain insight into patients’ or surrogates’ thoughts and needs.

Sending Messages

Health professionals must provide clear and understandable explanations of treatment options, avoiding the use of medical jargon or technical terms [ 22 ]. By offering evidence-based information and using empirical medical research findings to illustrate the advantages and disadvantages of each treatment option, they can empower patients to make informed decisions [ 23 ].

In this clinical scenario, the health professionals did not provide an appropriate environment or sufficient time for Yuliya to discuss Eric's medical treatment and address her concerns. Instead, they simply offered a QR code to access a video, assuming it would efficiently convey the necessary information. By using effective communication strategies during family meetings and providing patient-centered explanations and support, health professionals can significantly improve the decision-making process for patients with limited eHealth literacy.

According to the Context-Specific Type

Individuals with health, computer, and scientific literacy deficits might face significant challenges in accessing essential medical information, potentially leading to difficulties and delays in making informed medical decisions. To enhance individuals’ health literacy in such cases, the government can use various strategies including television advertisements, hospital promotions, public health clinic campaigns, and even celebrity endorsements to increase public awareness and understanding of health-related topics [ 2 ].

In the presented clinical scenario, health professionals could take a proactive approach by providing Yuliya with various prepared materials such as notebooks, pictures, DVDs, videos, QR codes, and apps. They should offer guidance on using these tools effectively and assess Yuliya’s understanding of the information provided by eHealth decision resources. The development of decision aids should adhere to specific criteria, including a rigorous and systematic process, the provision of scientific information and data on medical options, assistance in clarifying decision makers’ intentions and preferences, transparency regarding potential conflicts of interest, the avoidance of biased language that might lead to the selection of a particular medical option, and the exclusion of medical jargon or technical terms. Decision aids should provide evidence-based information to empower patients or surrogates to make informed choices [ 24 ].

Given the current global trend of an aging population, such decision-making aids are suitable for individuals older than 60 years, as they often have a higher demand for health care information. When creating these aids, it is essential to consider 3 critical components, economic costs, health and recovery aspects, and social and environmental support. Despite the increasing availability of technology-based decision-making aids, printed materials in large print, colorful chart cards, or brochures remain the most widely accepted format among patients and their surrogates, especially for more elderly populations [ 25 ].

Summaries of This Case

In this clinic scenario, health professionals did not consider Yuliya’s eHealth literacy when providing appropriate resources, even though she needed to make immediate decisions. To address this issue in the future, we recommend the following actions: first, health professionals should take the time to explain the conditions for a patient's admission to the ICU thoroughly. This includes providing clear information about the patient’s medical condition, the available treatment options, and the potential risks and benefits associated with each choice. Second, when circumstances permit, health professionals should engage in open and transparent communication with patients and their families. They should offer opportunities for patients to ask questions and express their concerns regarding daily urgent medical treatment decisions. Third, it is crucial to provide Yuliya and patients like her with sufficient and understandable information that empowers them to make informed decisions about their health care. This may involve using plain language, visual aids, or written materials to enhance their understanding.

By implementing these recommendations, health professionals could improve the overall patient experience and facilitate more effective shared decision-making, especially for individuals with limited eHealth literacy like Yuliya.

Recommendations for eHealth Issues in ICUs

Based on current development trends, the use of eHealth tools has become an irresistible trend. To address this issue in the future, we recommend the following actions: (1) education and training: ensure that all health professionals receive appropriate training on how to use eHealth tools; (2) when developing eHealth tools, it is crucial to ensure that the provided content is accurate, aligns with the needs and expectations of surrogates, and enhances usability through a user-friendly interface. Additionally, having real time technical support for clarification is essential to prevent any disruptions for surrogates during usage; and (3) conduct regular assessments of the eHealth tools and establish an evaluation framework that incorporates data and feedback for continuous updates and adjustments based on the surrogates' needs.

In summary, effectively integrating eHealth tools can improve the quality and efficiency of ICU care. However, it is imperative to ensure that these tools not only meet medical standards but also consider patient privacy and data security. Therefore, strategies such as education, interoperability, security, and user-friendliness are key to successfully meeting surrogates' expectations with eHealth tools.

Conclusions

In conclusion, the challenges associated with using eHealth decision resources to assist surrogates in ICU decision-making are multifaceted and require careful consideration. The high mortality rate in ICUs, when coupled with critical conditions, compels swift decision-making by surrogates, and the increasing popularity of eHealth tools adds complexity to this process.

The application of eHealth tools brings both opportunities and challenges. While these tools have the potential to disseminate important medical information and improve health care knowledge, they also present economic, informational, and inclusivity challenges. Economic disparities, information inaccuracies, and the risk of excluding specific groups must be addressed to ensure equitable access and benefit for all.

The eHealth Literacy Lily Model and the eHealth Literacy Scale are valuable tools for assessing individuals’ capability and literacy levels in using eHealth resources. To improve decision-making for individuals with limited eHealth literacy, both analytic and context-specific approaches are recommended. Effective communication through family meetings via using listening and questioning skills and providing patient-centered explanations can all enhance understanding and facilitate shared decision-making. Additionally, proactive measures such as using various materials, realia, and decision-making aids, can improve literacy in the health, computer, and scientific dimensions.

In summary, recognizing the time constraints and high-stakes nature of ICU decision-making, it is crucial to strike a balance between leveraging eHealth tools for efficiency and ensuring inclusivity. Health professionals should adopt a patient-centered approach, considering the diverse literacy levels and needs of individuals, especially those with limited eHealth literacy. Tailoring communication strategies, providing diverse materials, and assessing eHealth literacy levels using tools like the eHealth Literacy Scale are essential steps toward enhancing the effectiveness of eHealth decision resources in the ICU setting.

Conflicts of Interest

None declared.

  • Critical care statistics. Society of Critical Care Medicine. 2021. URL: https://www.sccm.org/Clinical-Resources/Guidelines/Guidelines [accessed 2024-02-21]
  • Current situation of medical institutions and statistics of hospital medical service volume e-book. In: Ministry of Health and Welfare. Taiwan. Ministry of Health and Welfare; 2020.
  • Abe T, Madotto F, Pham T, Nagata I, Uchida M, Tamiya N, et al. Epidemiology and patterns of tracheostomy practice in patients with acute respiratory distress syndrome in ICUs across 50 countries. Crit Care. 2018;22(1):195. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Miller JJ, Morris P, Files DC, Gower E, Young M. Decision conflict and regret among surrogate decision makers in the medical intensive care unit. J Crit Care. 2016;32:79-84. [ CrossRef ] [ Medline ]
  • Cai X, Robinson J, Muehlschlegel S, White DB, Holloway RG, Sheth KN, et al. Patient preferences and surrogate decision making in neuroscience intensive care units. Neurocrit Care. 2015;23(1):131-141. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lilley EJ, Morris MA, Sadovnikoff N, Luxford JM, Changoor NR, Bystricky A, et al. "Taking over somebody's life": experiences of surrogate decision-makers in the surgical intensive care unit. Surgery. 2017;162(2):453-460. [ CrossRef ] [ Medline ]
  • Wu F, Zhuang Y, Chen X, Wen H, Tao W, Lao Y, et al. Decision-making among the substitute decision makers in intensive care units: an investigation of decision control preferences and decisional conflicts. J Adv Nurs. 2020;76(9):2323-2335. [ CrossRef ] [ Medline ]
  • Wynn R, Gabarron E, Johnsen JAK, Traver V. Special issue on e-health services. Int J Environ Res Public Health. 2020;17(8):2885. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Terrasse M, Gorin M, Sisti D. Social media, e-health, and medical ethics. Hastings Cent Rep. 2019;49(1):24-33. [ CrossRef ] [ Medline ]
  • Arvind S, Martin C. What is e-health? Eur Soc Cardiol. 2020;18(24):10.
  • Eysenbach G. What is e-health? J Med Internet Res. 2001;3(2):e20. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Azeez NA, Van der Vyver C. Security and privacy issues in e-health cloud-based system: a comprehensive content analysis. Egypt Inform J. 2019;20(2):97-108. [ FREE Full text ] [ CrossRef ]
  • Ruppert L, Køster B, Siegert A, Cop C, Boyers L, Karimkhani C, et al. YouTube as a source of health information: analysis of sun protection and skin cancer prevention related issues. Dermatol Online J. 2017;23(1):1. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Swartzendruber A, Newton-Levinson A, Feuchs AE, Phillips AL, Hickey J, Steiner RJ. Sexual and reproductive health services and related health information on pregnancy resource center websites: a statewide content analysis. Womens Health Issues. 2018;28(1):14-20. [ CrossRef ] [ Medline ]
  • Norman CD, Skinner HA. eHealth literacy: essential skills for consumer health in a networked world. J Med Internet Res. 2006;8(2):e9. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Posiliti D, Cilliers L. An investigation of the level of e-health literacy in South Africa. ICICIS. 2019;12:261-271. [ FREE Full text ]
  • Stellefson M, Hanik B, Chaney B, Chaney D, Tennant B, Chavarria EA. eHealth literacy among college students: a systematic review with implications for eHealth education. J Med Internet Res. 2011;13(4):e102. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee J, Lee EH, Chae D. eHealth literacy instruments: systematic review of measurement properties. J Med Internet Res. 2021;23(11):e30644. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Koo M, Norman CD, Hsiao-Mei C. Psychometric evaluation of a Chinese version of the eHealth Literacy Scale (eHEALS) in school age children. Int Electron J Health Educ. 2012;15:29-36. [ FREE Full text ]
  • Norman CD, Skinner HA. eHEALS: the eHealth Literacy Scale. J Med Internet Res. 2006;8(4):e27. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Scheunemann LP, Ernecoff NC, Buddadhumaruk P, Carson SS, Hough CL, Curtis JR, et al. Clinician-family communication about patients' values and preferences in intensive care units. JAMA Intern Med. 2019;179(5):676-684. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Communication skills when providing decision support. The Ottawa Hospital. 2022. URL: https://decisionaid.ohri.ca/docs/decision_coaching_communication.pdf [accessed 2024-02-21]
  • The SHARE approach—essential steps of shared decisionmaking: expanded reference guide with sample conversation starters. Agency for Healthcare Research and Quality. 2020. URL: https://www.ahrq.gov/health-literacy/professional-training/shared-decision/tool/resource-2.html [accessed 2024-02-21]
  • Liu RW, Ceng BL, Ckang JN. Principles of evidence-based decision aid development. J Healthc Qual. 2017;11(4):25-30.
  • Baxter K, Mann R, Birks Y, Overton L. A scoping review of evidence on the use and effectiveness of decision aids in adult social care. J Long Term Care. 2021.:100-113. [ FREE Full text ] [ CrossRef ]

Abbreviations

Edited by A Mavragani; submitted 08.03.23; peer-reviewed by A Van Dongen, Y Chu, C Hudak; comments to author 21.06.23; revised version received 23.11.23; accepted 29.01.24; published 01.04.24.

©Wan-Na Sun, Chi-Yin Kao. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 01.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

41k Accesses

778 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

research paper writing assessment

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

research paper writing assessment

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

research paper writing assessment

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

research paper writing assessment

Winners of national penmanship contest crowned as handwriting is 'having a moment'

Namuun Baasanbol poses for a photo with her handwriting.

It’s regarded, hands down, as the Super Bowl of penmanship tournaments.

The Zaner-Bloser National Handwriting Contest, now in its 33rd year, crowned its 2024 grand champions on Monday, rewarding nine students from six states for their picture-perfect letters.

Ten-year old Zita Miller of White Bear Lake, Minnesota, took top honors in the fifth grade category. Her winning submission was one of the contest’s 80,000 entries.

“I like handwriting because it’s like art, drawing swirls and vines and curls,” Miller said, adding that she enjoys penning original mystery stories by hand.

Zita Miller poses for a photograph.

Namuun Baasanbold, from Carmel, Indiana, was named grand champion in the first grade category, and said she likes to give handwritten “love notes” to family and friends.

“Writing by hand makes me feel special,” she said.

The contest celebrates a centuries-old practice, but the victories come as handwriting is experiencing a kind of renaissance in the U.S. In January, California became the 22nd state to require cursive to be taught in schools — a significant jump from 2016, when just 12 states mandated it.

At the same time, various studies published over the past decade have detailed how writing with pencil and paper can benefit memory, cognitive development, reading comprehension and fine motor skills.

“Handwriting is definitely having a moment,” said Sharon Quirk-Silva, a member of the California state Assembly who sponsored the bill. She said she heard from people from all over the country who penned “beautifully handwritten notes” of support for the new law.

“We live in a very polarized nation. So many issues are contentious. But with this handwriting bill, we had full bipartisan support and goodwill. The importance of handwriting is something people seem to agree on,” she said.

Quirk-Silva said she backed the bill, in large part, because of her own experience — before becoming a lawmaker, she taught elementary school for 25 years.

“For years, technology has been taking over the curriculum in schools, with many kids being dormant in front of the screen, using two or three screens a day. Now, there’s a feeling of, ‘Let’s get pens and pencils back in kids’ hands,’” she said.

Although the California law mandates that first through sixth graders in the state receive cursive instruction, Quirk-Silva said she believes that writing by hand — in print or cursive — is an important language arts tool.

“It’s a way of slowing down a little bit, taking your thoughts from your brain to your hand and physically doing the writing,” she said.

Sophia Vinci-Booher, an assistant professor of psychology and human development at Vanderbilt University, said her research found that writing by hand enabled preschool students to form connections in the brain that likely support early letter recognition.

For that study , published in 2016, 20 children were asked to practice certain letters by writing them over and over, and practice others by pressing a button.

“Then we asked the children to go into an MRI scanner and look at those letters they’d been practicing,” Vinci-Booher explained. Her team analyzed the children’s brain activity to assess the functional connectivity between different areas of their brains.

“We found that the connection was stronger with letters they wrote by hand than those they tapped,” she said.

The research underscores the importance of the physical act of forming symbols, Vinci-Booher added.

“Writing by hand is a good thing for kids because it supports early reading development and it engages the fine motor system, which is developmentally important,” she said.

A 2021 study measured people’s brain activity during a memory task, this time finding that University of Tokyo students exhibited stronger activity and better recall after they had written information down on paper than when they did on a smartphone or even with a stylus on a tablet. The researchers suggested that the physical act of writing on paper provides the brain more details that trigger memory, and concluded that using paper notebooks can help students retain information in part because of their “tangible permanence.”

A similar study published in January compared the brain activity of students at the Norwegian University of Science and Technology who took notes by hand to the activity of those who typed their notes. The findings suggested that the students who wrote by hand had higher levels of electrical activity across a wide range of brain regions responsible for sensory processing and memory.

The results come as little surprise to many educators.

“I’ve seen firsthand that the kids learn more when they write by hand,” said Geeta Kadakia, who teaches second through fifth grade at the DAV Montessori School in Houston. “The lightbulb goes off through those achievements in handwriting, and handwriting leads to achievements in other areas, even math. When students make their numbers more neatly, their math scores improve.”

Laura Gajderowicz taught elementary school for 33 years in Indiana before retiring in 2022. She said she worried as she watched handwriting take a back seat to technology in U.S. classrooms in the early 2000s.

“Writing by hand does so much to help with the development of a student’s eye-hand coordination,” Gajderowicz said, adding: “I’m not against technology — I just think there’s a place at the table for both technology and handwriting when it comes to learning.”

This year, Gajderowicz served as a regional judge in the Zaner-Bloser contest.

“I was pleasantly surprised to see how many entries we had, especially from children in the upper grades,” she said.

Gajderowicz selected winners using criteria that analyzed the mechanics and precision of the letters students wrote, including their shapes, sizes, slant and spacing.

Contestants were asked to write the sentence, “The quick brown fox jumps over the lazy dog,” because it includes the entire alphabet, as well as a sentence explaining why handwriting makes them a better reader and writer.

Namuun Baasanbold’s entry.

Baasanbold said she was “over the moon” to find out she won: “I screamed and celebrated with friends at a restaurant with pizza and an appetizer and a sundae for dessert,” she said.

Her prizes include a trophy and $500 — plus bragging rights.

“I like to use my handwriting to impress people,” she said.

Mary Pflum is a national field producer for NBC News, based in New York.

Satellite photo showing a container ship entangled with the wreckage of a bridge.

Baltimore bridge collapse: a bridge engineer explains what happened, and what needs to change

research paper writing assessment

Associate Professor, Civil Engineering, Monash University

Disclosure statement

Colin Caprani receives funding from the Department of Transport (Victoria) and the Level Crossing Removal Project. He is also Chair of the Confidential Reporting Scheme for Safer Structures - Australasia, Chair of the Australian Regional Group of the Institution of Structural Engineers, and Australian National Delegate for the International Association for Bridge and Structural Engineering.

Monash University provides funding as a founding partner of The Conversation AU.

View all partners

When the container ship MV Dali, 300 metres long and massing around 100,000 tonnes, lost power and slammed into one of the support piers of the Francis Scott Key Bridge in Baltimore, the bridge collapsed in moments . Six people are presumed dead, several others injured, and the city and region are expecting a months-long logistical nightmare in the absence of a crucial transport link.

It was a shocking event, not only for the public but for bridge engineers like me. We work very hard to ensure bridges are safe, and overall the probability of being injured or worse in a bridge collapse remains even lower than the chance of being struck by lightning.

However, the images from Baltimore are a reminder that safety can’t be taken for granted. We need to remain vigilant.

So why did this bridge collapse? And, just as importantly, how might we make other bridges more safe against such collapse?

A 20th century bridge meets a 21st century ship

The Francis Scott Key Bridge was built through the mid 1970s and opened in 1977. The main structure over the navigation channel is a “continuous truss bridge” in three sections or spans.

The bridge rests on four supports, two of which sit each side of the navigable waterway. It is these two piers that are critical to protect against ship impacts.

And indeed, there were two layers of protection: a so-called “dolphin” structure made from concrete, and a fender. The dolphins are in the water about 100 metres upstream and downstream of the piers. They are intended to be sacrificed in the event of a wayward ship, absorbing its energy and being deformed in the process but keeping the ship from hitting the bridge itself.

Diagram of a bridge

The fender is the last layer of protection. It is a structure made of timber and reinforced concrete placed around the main piers. Again, it is intended to absorb the energy of any impact.

Fenders are not intended to absorb impacts from very large vessels . And so when the MV Dali, weighing more than 100,000 tonnes, made it past the protective dolphins, it was simply far too massive for the fender to withstand.

Read more: I've captained ships into tight ports like Baltimore, and this is how captains like me work with harbor pilots to avoid deadly collisions

Video recordings show a cloud of dust appearing just before the bridge collapsed, which may well have been the fender disintegrating as it was crushed by the ship.

Once the massive ship had made it past both the dolphin and the fender, the pier – one of the bridge’s four main supports – was simply incapable of resisting the impact. Given the size of the vessel and its likely speed of around 8 knots (15 kilometres per hour), the impact force would have been around 20,000 tonnes .

Bridges are getting safer

This was not the first time a ship hit the Francis Scott Bridge. There was another collision in 1980 , damaging a fender badly enough that it had to be replaced.

Around the world, 35 major bridge collapses resulting in fatalities were caused by collisions between 1960 and 2015, according to a 2018 report from the World Association for Waterborne Transport Infrastructure. Collisions between ships and bridges in the 1970s and early 1980s led to a significant improvement in the design rules for protecting bridges from impact.

A greenish book cover with the title Ship Collision With Bridges.

Further impacts in the 1970s and early 1980s instigated significant improvements in the design rules for impact.

The International Association for Bridge and Structural Engineering’s Ship Collision with Bridges guide, published in 1993, and the American Association of State Highway and Transporation Officials’ Guide Specification and Commentary for Vessel Collision Design of Highway Bridges (1991) changed how bridges were designed.

In Australia, the Australian Standard for Bridge Design (published in 2017) requires designers to think about the biggest vessel likely to come along in the next 100 years, and what would happen if it were heading for any bridge pier at full speed. Designers need to consider the result of both head-on collisions and side-on, glancing blows. As a result, many newer bridges protect their piers with entire human-made islands.

Of course, these improvements came too late to influence the design of the Francis Scott Key Bridge itself.

Lessons from disaster

So what are the lessons apparent at this early stage?

First, it’s clear the protection measures in place for this bridge were not enough to handle this ship impact. Today’s cargo ships are much bigger than those of the 1970s, and it seems likely the Francis Scott Key Bridge was not designed with a collision like this in mind.

So one lesson is that we need to consider how the vessels near our bridges are changing. This means we cannot just accept the structure as it was built, but ensure the protection measures around our bridges are evolving alongside the ships around them.

Photo shows US Coast Guard boat sailing towards a container ship entangled in the wreckage of a large bridge.

Second, and more generally, we must remain vigilant in managing our bridges. I’ve written previously about the current level of safety of Australian bridges, but also about how we can do better.

This tragic event only emphasises the need to spend more on maintaining our ageing infrastructure. This is the only way to ensure it remains safe and functional for the demands we put on it today.

  • Engineering
  • Infrastructure
  • Urban infrastructure
  • container ships
  • Baltimore bridge collapse

research paper writing assessment

School of Social Sciences – Public Policy and International Relations opportunities

research paper writing assessment

School of Social Sciences – Human Geography opportunities

research paper writing assessment

School of Social Sciences – Criminology opportunities

research paper writing assessment

School of Social Sciences – Academic appointment opportunities

research paper writing assessment

Biocloud Project Manager - Australian Biocommons

IMAGES

  1. Research papers Writing Steps And process of writing a paper

    research paper writing assessment

  2. (PDF) 6-Simple-Steps-for-Writing-a-Research-Paper

    research paper writing assessment

  3. Writing Good Research Paper

    research paper writing assessment

  4. Evaluation Essay

    research paper writing assessment

  5. Research Paper Format

    research paper writing assessment

  6. Best Steps to Write a Research Paper in College/University

    research paper writing assessment

VIDEO

  1. Salient Features Of Abstract In A Research Paper(ENGLISH FOR RESEARCH PAPER WRITING)

  2. Planning And Preparation (ENGLISH FOR RESEARCH PAPER WRITING )

  3. Previous year question paper ll Assessment for Learning ll May 2023

  4. Exam Answer Paper Writing Techniques by Professor Engineer Sujit Wagh

  5. Sections Of A Research Paper (ENGLISH FOR RESEARCH PAPER WRITING)

  6. Research Paper Writing with ChatGPT: Bypass AI Detection & Plagiarism

COMMENTS

  1. How To Write A Research Paper (FREE Template

    We've covered a lot of ground here. To recap, the three steps to writing a high-quality research paper are: To choose a research question and review the literature. To plan your paper structure and draft an outline. To take an iterative approach to writing, focusing on critical writing and strong referencing.

  2. Writing Assessment

    Writing Assessment. Effective writing pedagogy depends on assessment practices that set students up for success and are fair and consistent. Ed White and Stanford Lecturer Cassie A. Wright have argued influentially that good assessment thus begins with assignment design, including a clear statement of learning objectives and comments on drafts ...

  3. Rubric Design

    CCCC Committee on Assessment. "Writing Assessment: A Position Statement." November 2006 (Revised March 2009). Conference on College Composition and Communication. Web. Gallagher, Chris W. "Assess Locally, Validate Globally: Heuristics for Validating Local Writing Assessments." Writing Program Administration 34.1 (2010): 10-32. Web. Huot ...

  4. PDF Empowering Students' Writing Skill through Performance Assessment

    This current research, the understanding, and the effect of performance assessment to empower students' writing ability are highly provided. Previous researches showed that performance assessment could increase writing skills and encourage students psychologically, i.e., by improving motivation to overcome their self-efficacy

  5. Assessing Writing: A Review of the Main Trends

    The objective of any assessment process is to support and enhance student learning and to collect information about students that will help to direct future instruction (Salmani, 2014). Evaluation ...

  6. How to Write a Research Paper

    A research paper is a piece of academic writing that provides analysis, interpretation, and argument based on in-depth independent research. Research papers are similar to academic essays , but they are usually longer and more detailed assignments, designed to assess not only your writing skills but also your skills in scholarly research.

  7. PDF Scientific Writing Assessment Guide for Faculty Use

    advanced. The Scientific Writing Assessment (SWA) is an instrument intended to distinguish and quantify scientific writing skills to aid faculty in identifying and prioritizing development needs, giving clear and consistent feedback, and monitoring progress toward writing goals. Description The SWA is a writing competency rubric that includes ...

  8. Assessing Writing

    Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language.Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional ('direct' and standardised forms of) testing of writing, alternative performance ...

  9. What has been assessed in writing and how? Empirical evidence from

    Using content analysis, this review study examines 219 empirical research articles published in Assessing Writing (2000-2018) to give a view of the development of writing assessment over the past 25 years. It reports overall and periodic analyses (2000-2009 and 2010-2018) of the contextual, theoretical, and methodological orientations of those articles to gain a comprehensive ...

  10. Writing a Research Paper Introduction

    Step 1: Introduce your topic. Step 2: Describe the background. Step 3: Establish your research problem. Step 4: Specify your objective (s) Step 5: Map out your paper. Research paper introduction examples. Frequently asked questions about the research paper introduction.

  11. Writing assessment, comparative judgement and students' evaluative

    Writing is seen as one of the key competencies for students to master, but has also been one of the more challenging skills to assess fairly and reliably. In this regular issue, we publish four articles which discuss writing assessment from different perspectives. The first article in this regular issue, reports from a study involving more than ...

  12. Writing Assessment Literacy

    Although classroom writing assessment is a significant responsibility for writing teachers, many instructors lack an understanding of sound and effective assessment practices in the writing classroom aka Writing Assessment Literacy (WAL). ... As has been evident from past research (Crusan, 2010; Lee, 2017; ... In this paper, the author focuses ...

  13. PDF Assessment criteria for the research paper

    LAW5082: MASTERS RESEARCH UNIT Assessment criteria for the research paper Aspects of the research paper that will be relevant to the determination of a final grade are as follows: Problem Definition and Methodology Statement of the research problem, the aims of the paper and the significance of the research. Explanation of scope of the study.

  14. Using Assessment to Improve Your Writing

    When you assess your writing, you are committing to improve it. Your commitment to improve will certainly include making adjustments to papers and correcting errors. However, assessing your writing can also motivate you to practice writing. As with anything, practicing will help your writing improve. Below are a series of habits that can help ...

  15. PDF Writing Assessment and Evaluation Rubrics

    Analytic scoring is usually based on a scale of 0-100 with each aspect receiving a portion of the total points. The General Rubric for Analytic Evaluation on page. 14 can be used to score a piece of writing in this way as can the rubrics for specific writing types on pages 17, 22, 27, 32-34, and 43.

  16. Research Paper

    Definition: Research Paper is a written document that presents the author's original research, analysis, and interpretation of a specific topic or issue. It is typically based on Empirical Evidence, and may involve qualitative or quantitative research methods, or a combination of both. The purpose of a research paper is to contribute new ...

  17. Scaffolding Methods for Research Paper Writing

    Research Paper Scaffold: This handout guides students in researching and organizing the information they need for writing their research paper.; Inquiry on the Internet: Evaluating Web Pages for a Class Collection: Students use Internet search engines and Web analysis checklists to evaluate online resources then write annotations that explain how and why the resources will be valuable to the ...

  18. A Critical Review of Research on Student Self-Assessment

    This article is a review of research on student self-assessment conducted largely between 2013 and 2018. The purpose of the review is to provide an updated overview of theory and research. The treatment of theory involves articulating a refined definition and operationalization of self-assessment. The review of 76 empirical studies offers a critical perspective on what has been investigated ...

  19. Writing Assessment Research Papers

    The volume concludes with 18 assertions on writing assessment designed to guide future research in the field. Written with the intention of making a restorative milestone in the history of writing assessment, Writing Assessment, Social Justice, and the Advancement of Opportunity generates new directions for the field of writing studies.

  20. (PDF) Scoring Procedures for Assessing Writing

    According to Park (2004), to assess writing ability, scoring rubrics should be used, namely analytic, holistic and primary trait are mentioned (Bachman & Palmer 1996;Weigle 2002;Alderson, 1995 ...

  21. (PDF) Developing and Validating Scoring Rubrics for the Assessment of

    rubrics for the assessment of research papers writing ability were determined. The results . from pretest and posttest on research paper writing in English indicated that the process genre .

  22. PDF A Study of Students' Assessment in Writing Skills of the ...

    The analysis was made in two stages. In the first stage, focusing on the descriptive statistics, the data were analyzed in the following three steps. Step 1: The average score and frequency of each item were calculated. The whole data was fed in the form of master table to tabulate into different variables.

  23. Writing Assessments

    Discover more writing assessment tools for Grade 2, Grade 3, Grades 4-5, Grades 6-8, Grades 9-10, and Grades 11-12—including writing on tests and responding to prompts. Jump to . . . ... Welcome the Hmong to America MLA Research Paper Okay. Hmong People MLA Research Paper Poor. Narrative Writing, Creative Writing.

  24. Journal of Medical Internet Research

    This paper provides a comprehensive overview of eHealth. The challenges of applying eHealth tools, including economic disparities and information inaccuracies are addressed. This study then introduces eHealth literacy and the assessment tools to evaluate users' capability and literacy levels in using eHealth resources.

  25. Predicting and improving complex beer flavor through machine ...

    Sensory assessments took place between 10-12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C.

  26. Nine students win national handwriting contest

    Nine students won this year's national handwriting contest. A growing number of states are requiring cursive instruction, and research supports the benefits of writing on paper.

  27. Baltimore bridge collapse: a bridge engineer explains what happened

    A 20th century bridge meets a 21st century ship. The Francis Scott Key Bridge was built through the mid 1970s and opened in 1977. The main structure over the navigation channel is a "continuous ...