Enhancing students’ critical thinking skills: is comparing correct and erroneous examples beneficial?

  • Original Research
  • Open access
  • Published: 26 September 2021
  • Volume 49 , pages 747–777, ( 2021 )

Cite this article

You have full access to this open access article

critical thinking skills google scholar

  • Lara M. van Peppen   ORCID: orcid.org/0000-0002-1219-8267 1   nAff2 ,
  • Peter P. J. L. Verkoeijen 1 , 3 ,
  • Anita E. G. Heijltjes 3 ,
  • Eva M. Janssen 4 &
  • Tamara van Gog 4  

12k Accesses

11 Citations

5 Altmetric

Explore all metrics

There is a need for effective methods to teach critical thinking (CT). One instructional method that seems promising is comparing correct and erroneous worked examples (i.e., contrasting examples). The aim of the present study, therefore, was to investigate the effect of contrasting examples on learning and transfer of CT-skills, focusing on avoiding biased reasoning. Students ( N  = 170) received instructions on CT and avoiding biases in reasoning tasks, followed by: (1) contrasting examples, (2) correct examples, (3) erroneous examples, or (4) practice problems. Performance was measured on a pretest, immediate posttest, 3-week delayed posttest, and 9-month delayed posttest. Our results revealed that participants’ reasoning task performance improved from pretest to immediate posttest, and even further after a delay (i.e., they learned to avoid biased reasoning). Surprisingly, there were no differences in learning gains or transfer performance between the four conditions. Our findings raise questions about the preconditions of contrasting examples effects. Moreover, how transfer of CT-skills can be fostered remains an important issue for future research.

Similar content being viewed by others

critical thinking skills google scholar

Unraveling the effects of critical thinking instructions, practice, and self-explanation on students’ reasoning performance

The effects of learning from correct and erroneous examples in individual and collaborative settings, deliberate erring improves far transfer of learning more than errorless elaboration and spotting and correcting others’ errors.

Avoid common mistakes on your manuscript.

Introduction

Every day, we reason and make many decisions based on previous experiences and existing knowledge. To do so we often rely on a number of heuristics (i.e., mental shortcuts) that ease reasoning processes (Tversky & Kahneman, 1974 ). Usually, these decisions are inconsequential but sometimes they can lead to biases (i.e., deviating from ideal normative standards derived from logic and probability theory) with severe consequences. To illustrate, a forensic expert who misjudges fingerprint evidence because it verifies his or her preexisting beliefs concerning the likelihood of the guilt of a defendant, displays the so-called confirmation bias, which can result in a misidentification and a wrongful conviction (e.g., the Madrid bomber case; Kassin et al., 2013 ). Biases occur when people rely on heuristic reasoning (i.e., Type 1 processing) when that is not appropriate, do not recognize the need for analytical or reflective reasoning (i.e., Type 2 processing), are not willing to switch to Type 2 processing or unable to sustain it, or miss the relevant mindware to come up with a better response (e.g., Evans, 2003 ; Stanovich, 2011 ). Our primary tool for reasoning and making better decisions, and thus to avoid biases in reasoning and decision making, is critical thinking (CT), which is generally characterized as “purposeful, self-regulatory judgment that results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations on which that judgment is based” (Facione, 1990 , p. 2).

Because CT is essential for successful functioning in one’s personal, educational, and professional life, fostering students’ CT has become a central aim of higher education (Davies, 2013 ; Halpern, 2014 ; Van Gelder, 2005 ). However, several large-scale longitudinal studies were quite pessimistic that this laudable aim would be realized merely by following a higher education degree program. These studies revealed that CT-skills of many higher education graduates are insufficiently developed (e.g., Arum & Roksa, 2011 ; Flores et al., 2012 ; Pascarella et al., 2011 ; although a more recent meta-analytic study reached the more positive conclusion that students’ do improve their CT-skills over college years: Huber & Kuncel, 2016 ). Hence, there is a growing body of literature on how to teach CT (e.g., Abrami et al., 2008 , 2014 ; Van Peppen et al., 2018 , 2021 ; Angeli & Valanides, 2009 ; Niu et al., 2013 ; Tiruneh et al., 2014 , 2016 ).

However, there are different views on the best way to teach CT; the most well-known debate being whether CT should be taught in a general or content-specific manner (Abrami et al., 2014 ; Davies, 2013 ; Ennis, 1989 ; Moore, 2004 ). This debate has faded away during the last years, since most researchers nowadays commonly agree that CT can be seen in terms of both general skills (e.g., sound argumentation, evaluating statistical information, and evaluating the credibility of sources) and specific skills or knowledge used in the context of disciplines (e.g., diagnostic reasoning). Indeed, it has been shown that the most effective teaching methods combine generic instruction on CT with the opportunity to integrate the general principles that were taught with domain-specific subject matter. It is well established, for instance, that explicit teaching of CT combined with practice improves learning of CT-skills required for unbiased reasoning (e.g., Abrami et al., 2008 ; Heijltjes et al., 2014b ). However, while some effective teaching methods have been identified, it is as yet unclear under which conditions transfer of CT-skills across tasks or domains can be promoted, that is, the ability to apply acquired knowledge and skills to some new context of related materials (e.g., Barnett & Ceci, 2002 ).

Transfer has been described as existing on a continuum from near to far, with lower degrees of similarity between the initial and transfer situation along the way (Salomon & Perkins, 1989 ). Transferring knowledge or skills to a very similar situation, for instance problems in an exam of the same kind as practiced during the lessons, refers to ‘near’ transfer. By contrast, transferring between situations that share similar structural features but, on appearance, seem remote and alien to one another is considered ‘far’ transfer.

Previous research has shown that CT-skills required for unbiased reasoning consistently failed to transfer to novel problem types, i.e., far transfer, even when using instructional methods that proved effective for fostering transfer in various other domains (Van Peppen et al., 2018 , 2021 ; Heijltjes et al., 2014a , 2014b , 2015 , and this also applies to CT-skills more generally, see for example Halpern, 2014 ; Ritchhart & Perkins, 2005 ; Tiruneh et al., 2014 , 2016 ). This lack of transfer of CT-skills is worrisome because it would be unfeasible to train students on each and every type of reasoning bias they will ever encounter. CT-skills acquired in higher education should transfer to other domains and on-the-job and, therefore, it is crucial to acquire more knowledge on how transfer of these skills can be fostered (and this also applies to CT-skills more generally, see for example, Halpern, 2014 ; Beaulac & Kenyon, 2014 ; Lai, 2011 ; Ritchhart & Perkins, 2005 ). One instructional method that seems promising is comparing correct and erroneous worked examples (i.e., contrasting examples; e.g., Durkin & Rittle-Johnson, 2012 ).

Benefits of studying examples

Over the last decades, a large body of research has investigated learning from studying worked examples as opposed to unsupported problem solving. Worked examples consist of a problem statement and an entirely and correctly worked-out solution procedure (in this paper referred to as correct examples; Renkl, 2014 ; Renkl et al., 2009 ; Sweller et al., 1998 ; Van Gog et al., 2019 ). Typically, studying correct examples is more beneficial for learning than problem-solving practice, especially in initial skill acquisition (for reviews, see Atkinson et al., 2003 ; Renkl, 2014 ; Sweller et al., 2011 ; Van Gog et al., 2019 ). Although this worked example effect has been mainly studied in domains such as mathematics and physics, it has also been demonstrated in learning argumentation skills (Schworm & Renkl, 2007 ), learning to reason about legal cases (Nievelstein et al., 2013 ) and medical cases (Ibiapina et al., 2014 ), and novices’ learning to avoid biased reasoning (Van Peppen et al., 2021 ).

The worked example effect can be explained by cognitive load imposed on working memory (Paas et al., 2003a ; Sweller, 1988 ). Cognitive Load Theory (CLT) suggests that—given the limited capacity and duration of our working memory—learning materials should be designed so as to decrease unnecessary cognitive load related to the presentation of the materials (i.e., extraneous cognitive load). Instead, learners’ attention should be devoted towards processes that are directly relevant for learning (i.e., germane cognitive load). When solving practice problems, novices often use general and weak problem-solving strategies that impose high extraneous load. During learning from worked examples, however, the high level of instructional guidance provides learners with the opportunity to focus directly on the problem-solving principles and their application. Accordingly, learners can use the freed up cognitive capacity to engage in generative processing (Wittrock, 2010 ). Generative processing involves actively constructing meaning from to-be-learned information, by mentally organizing it into coherent knowledge structures and integrating these principles with one’s prior knowledge (i.e., Grabowski, 1996 ; Osborne & Wittrock, 1983 ; Wittrock, 1974 , 1990 , 1992 , 2010 ). These knowledge structures in turn can aid future problem solving (Kalyuga, 2011 ; Renkl, 2014 ; Van Gog et al., 2019 ).

A recent study showed that the worked example effect also applies to novices’ learning to avoid biased reasoning (Van Peppen et al., 2021 Footnote 1 ): participants’ performance on isomorphic tasks on a final test improved after studying correct examples, but not after solving practice problems. However, studying correct examples was not sufficient to establish transfer to novel tasks that shared similar features with the isomorphic tasks, but on which participants had not acquired any knowledge during instruction/practice. The latter finding might be explained by the fact that students sometimes process worked examples superficially and do not spontaneously use the freed up cognitive capacity to engage in generative processing needed for successful transfer (Renkl & Atkinson, 2010 ). Another possibility is that these examples did not sufficiently encourage learners to make abstractions of the underlying principles and explore possible connections between problems (e.g., Perkins & Salomon, 1992 ). It seems that to fully take advantage of worked examples in learning unbiased reasoning, students should be encouraged to be actively involved in the learning process and facilitated to focus on the underlying principles (e.g., Van Gog et al., 2004 ).

The potential of erroneous examples

While most of the worked-example research focuses on correct examples, recent research suggests that students learn at a deeper level and may come to understand the principles behind solution steps better when (also) provided with erroneous examples (e.g., Adams et al., 2014 ; Barbieri & Booth, 2016 ; Booth et al., 2013 ; Durkin & Rittle-Johnson, 2012 ; McLaren et al., 2015 ). In studies involving erroneous examples, which are often preceded by correct examples (e.g., Booth et al., 2015 ), students are usually prompted to locate the incorrect solution step and to explain why this step is incorrect or to correct it. This induces generative processing, such as comparison with internally represented correct examples and (self-)explaining (e.g., Chi et al., 1994 ; McLaren et al., 2015 ; Renkl, 1999 ). Students are encouraged to go beyond noticing surface characteristics and to think deeply about how erroneous steps differ from correct ones and why a solution step is incorrect (Durkin & Rittle-Johnson, 2012 ). This might help them to correctly update schemas of correct concepts and strategies and, moreover, to create schemas for erroneous strategies (Durkin & Rittle-Johnson, 2012 ; Große & Renkl, 2007 ; Siegler, 2002 ; Van den Broek & Kendeou, 2008 ; VanLehn, 1999 ), reducing the probability of recurring erroneous solutions in the future (Siegler, 2002 ).

However, erroneous examples are typically presented separately from correct examples, requiring learners to use mental resources to recall the gist of the no longer visible correct solutions (e.g., Große & Renkl, 2007 ; Stark et al., 2011 ). Splitting attention across time increases the likelihood that mental resources will be expended on activities extraneous to learning, which subsequently may hamper learning (i.e., temporal contiguity effect: e.g., Ginns, 2006 ). One could, therefore, argue that the use of erroneous examples could be optimized by providing them side by side with correct examples (e.g., Renkl & Eitel, 2019 ). This would allow learners to focus on activities directly relevant for learning, such as structural alignment and detection of meaningful commonalities and differences between the examples (e.g., Durkin & Rittle-Johnson, 2012 ; Roelle & Berthold, 2015 ). Indeed, studies on comparing correct and erroneous examples revealed positive effects in math learning (Durkin & Rittle-Johnson, 2012 ; Kawasaki, 2010 ; Loibl & Leuders, 2018 , 2019 ; Siegler, 2002 ).

The present study

We already indicated that it is still an important open question, which instructional strategy can be used to enhance transfer of CT skills. To reiterate, previous research demonstrated that practice consisting of worked example study was more effective for novices’ learning than practice problem solving, but it was not sufficient to establish transfer. Recent research has demonstrated the potential of erroneous examples, which are often preceded by correct examples. Comparing correct and erroneous examples (from here on referred to as contrasting examples) when presenting them side-by-side, seems to hold a considerable promise with respect to promoting generative processing and transfer. Hence, the purpose of the present study was to investigate whether contrasting examples of fictitious students’ solutions on ‘heuristics and biases tasks’ (a specific sub-category of CT skills: e.g., Tversky & Kahneman, 1974 ) would be more effective to foster learning and transfer than studying correct examples only, studying erroneous examples only, or solving practice problems. Performance was measured on a pretest, immediate posttest, 3-week delayed posttest, and 9-month delayed posttest (for half of the participants due to practical reasons), to examine effects on learning and transfer.

Based on the literature presented above, we hypothesized that studying correct examples would impose less cognitive load (i.e., lower investment of mental effort during learning ) than solving practice problems (i.e., worked example effect: e.g., Van Peppen et al., 2021 ; Renkl, 2014 ; Hypothesis 1). Whether there would be differences in invested mental effort between contrasting examples, studying erroneous examples, and solving practice problems, however, is an open question. That is, it is possible that these instructional formats impose a similar level of cognitive load, but originating from different processes: while practice problem solving may impose extraneous load that does not contribute to learning, generative processing of contrasting or erroneous examples may impose germane load that is effective for learning (Sweller et al., 2011 ). As such, it is important to consider invested mental effort (i.e., experienced cognitive load) in combination with learning outcomes. Secondly, we hypothesized that students in all conditions would benefit from the CT-instructions combined with the practice activities, as evidenced by pretest to immediate posttest gains in performance on instructed and practiced items (i.e., learning : Hypothesis 2). Furthermore, based on cognitive load theory, we hypothesized that studying correct examples would be more beneficial for learning than solving practice problems (i.e., worked example effect: e.g., Van Peppen et al., 2021 ; Renkl, 2014 ). Based on the aforementioned literature, we expected that studying erroneous examples would promote generative processing more than studying correct examples. Whether that generative processing would actually enhance learning, however, is an open question. This can only be expected to be the case if learners can actually remember and apply the previously studied information on the correct solution, which arguably involves higher cognitive load (i.e., temporal contiguity effect) than studying correct examples or contrasting examples. As contrasting can help learners to focus on key information and thereby induces generative processes directly relevant for learning (e.g., Durkin & Rittle-Johnson, 2012 ), we expected that contrasting examples would be most effective. Thus, we predict the following pattern of results regarding performance gains on learning items (Hypothesis 3): contrasting examples > correct examples > practice problems. As mentioned above, it is unclear how the erroneous examples condition would compare to the other conditions.

Furthermore, we expected that generative processing would promote transfer. Despite findings of previous studies in other domains (e.g., Paas, 1992 ), we found no evidence in a previous study that studying correct examples or solving practice problems would lead to a difference in transfer performance (Van Peppen et al., 2021 ). Therefore, we predict the following pattern of results regarding performance on non-practiced items of the immediate posttest (i.e., transfer , Hypothesis 4): contrasting examples > correct examples ≥ practice problems. Again, it is unclear how the erroneous examples condition would compare to the other conditions.

We expected these effects (Hypotheses 3 and 4) to persist on the delayed posttests. As effects of generative processing (relative to non-generative learning strategies) sometimes increase as time goes by (Dunlosky et al., 2013 ), they may be even greater after a delay. For a schematic overview of the hypotheses, see Table 1 .

We created an Open Science Framework (OSF) page for this project, where all materials, the dataset, and all script files of the experiment are provided (osf.io/8zve4/).

Participants and design

Participants were 182 first-year ‘Public Administration’ and ‘Safety and Security Management’ students of a Dutch university of applied sciences (i.e., higher professional education), both part of the Academy for Security and Governance. These students were approximately 20 years old ( M  = 19.53, SD  = 1.91) and most of them were male (120 male, 62 female). Before they were involved in these study programs, they completed secondary education (senior general secondary education: n  = 122, pre-university: n  = 7) or went to college (secondary vocational education: n  = 28, higher professional education: n  = 24, university education: n  = 1).

Of the 182 students (i.e., total number of students in these cohorts), 173 students (95%) completed the first experimental session (see Fig.  1 for an overview) and 158 students (87%) completed both the first and second experimental session. Additionally, 83 of these students (46%) of the Safety and Security Management program completed the 9-month delayed posttest during the first mandatory CT-lesson of their second study year (we had no access to another CT-lesson of the Public Administration program). The number of absentees during a lesson (about 15 in total) is quite common for mandatory lessons in these programs and often due to illness or personal circumstances. Students who were absent during the first experimental session and returned to the second experimental session could not participate in the study because they had missed the intervention phase.

figure 1

Overview of the study design. The four conditions differed in practice activities during the practice phase

We defined a priori that participants would be excluded in case of excessively fast reading speed. Considering that even fast readers can read no more than 350 words per minute (e.g., Trauzettel-Klosinski & Dietz, 2012 ), and the text of our instructions additionally required understanding, we assumed that participants who spent < 0.17 s per word (i.e., 60 s/350 words) did not read the instructions seriously. These participants were excluded from the analyses. Due to drop-outs, we decided to split the analyses to include as many participants as possible. We had a final sample of 170 students ( M age  = 19.54, SD = 1.93; 57 female) for the pretest to immediate posttest analyses, a subsample of 155 students for the immediate to 3-week delayed posttest analyses ( M age  = 19.46, SD = 1.91; 54 female), and a subsample of 82 students (46%) for the 3-week delayed to 9-month delayed posttest ( M age  = 19.27, SD = 1.79; 25 female). We calculated a power function of our analyses using the G*Power software (Faul et al., 2009 ) based on these sample sizes. The power for the crucial Practice Type × Test Moment interaction—under a fixed alpha level of 0.05 and with a correlation between measures of 0.3 (e.g., Van Peppen et al., 2018 )—for detecting a small (η p 2  = .01), medium (η p 2  = .06), and large effect (η p 2  = .14) respectively, is estimated at .42, > .99, and 1.00 for the pretest to immediate posttest analyses; .39, > .99, and 1.00 for the immediate to 3-week delayed posttest analyses; and .21, .90, and > .99 for the 3-week to 9-month delayed posttest. Thus, the power of our study should be sufficient to pick up medium-sized interaction effects.

Students participated in a pretest-intervention–posttest design (see Fig.  1 ). After completing the pretest on learning items (i.e., instructed and practiced during the practice phase), all participants received succinct CT instructions and two correct worked examples. Thereafter, they were randomly assigned to one of four conditions that differed in practice activities during the practice phase: they either (1) compared correct and erroneous examples (‘contrasting examples’, n  = 41; n  = 35; n  = 20); (2) studied correct examples (i.e., step-by-step solutions to unbiased reasoning) and explained why these were right (‘correct examples’, n  = 43; n  = 40; n  = 21); (3) studied erroneous examples (i.e., step-by-step incorrect solutions including biased reasoning) and explained why these were wrong (‘erroneous examples’, n  = 43; n  = 40; n  = 18); or (4) solved practice problems and justified their answers (‘practice problems’, n  = 43; n  = 40; n  = 23). A detailed explanation of the practice activities can be found in the CT-practice subsection below. Immediately after the practice phase and after a 3-week delay, participants completed a posttest on learning items (i.e., instructed and practiced during the practice phase) and transfer items (i.e., not instructed and practiced during the practice phase). Additionally, some students took a posttest after a 9-month delay. Further CT-instructions were given (in three lessons of approx. 90 min) in-between the second session of the experiment and the 9-month follow up. In these lessons, for example, the origins of the concept of CT, inductive and deductive reasoning, and the Toulmin model of argument were discussed. Thus, these data were exploratively analyzed and need to be interpreted with caution.

In the following paragraphs, the used learning materials, instruments and associated measures, and characteristics of the experimental conditions are described.

CT-skills tests

The CT-skills tests consisted of classic heuristics and biases tasks that reflected important aspects of CT. In all tasks, belief bias played a role, that is, when the conclusion aligns with prior beliefs or real-world knowledge but is invalid or vice versa (Evans et al., 1983 ; Markovits & Nantel, 1989 ; Newstead et al., 1992 ). These tasks require that one recognizes the need for analytical and reflective reasoning (i.e. based on knowledge and rules of logical reasoning and statistical reasoning) and switches to this type of reasoning. This is only possible when heuristic responses are successfully inhibited.

The pretest consisted of six classic heuristics and biases items, across two categories (see Online Appendix A for an example of each category): syllogistic reasoning (i.e., logical reasoning) and conjunction (i.e., statistical reasoning) items. Three syllogistic reasoning items measured students’ tendency to be influenced by the believability of a conclusion that is inferred from two premises when evaluating the logical validity of that conclusion (adapted from Evans, 2002 ). For instance, the conclusion that cigarettes are healthy is logically valid given the premises that all things you can smoke are healthy and that you can smoke cigarettes. Most people, however, indicate that the conclusion is invalid because it does not align with their prior beliefs or real-world knowledge (i.e., belief bias, Evans et al., 1983 ). Three conjunction items examined to what extent the conjunction rule ( P (A&B) ≤  P (B))—which states that the probability of multiple specific events both occurring must be lower than the probability of one of these events occurring alone—is neglected (Tversky & Kahneman, 1983 ). To illustrate, people have the tendency to judge two things with a causal or correlational link, for example advanced age and occurrence of heart attacks, as more probable than one of these on its own.

The posttests consisted of parallel versions (i.e., structurally equivalent but different surface features) of the six pretest items which were instructed and practiced and, thus, served to assess differences in learning outcomes. Additionally, the posttests contained six items across two non-practiced categories that served to assess differences in transfer performance (see Online Appendix A for an example of each category). Three Wason selection items measured students’ tendency to disprove a hypothesis by verifying rules rather than falsifying them (i.e., confirmation bias, adapted from Stanovich, 2011 ). Three base-rate items examined students’ tendency to incorrectly judge the likelihood of individual-case evidence (e.g., from personal experience, a single case, or prior beliefs) by not considering all relevant statistical information (i.e., base-rate neglect, adapted from Fong et al., 1986 ; Stanovich & West, 2000 ; Stanovich et al., 2016 ; Tversky & Kahneman, 1974 ). These transfer items shared similar features with the learning categories, namely, one category requiring knowledge and rules of logic (i.e., Wason selection tasks can be solved by applying syllogism rules) and one category requiring knowledge and rules of statistics (i.e., base-rate tasks can be solved by appropriate probability and data interpretation).

The cover stories of all test items were adapted to the domain of participants’ study program (i.e., Public Administration and Safety and Security Management). A multiple-choice (MC) format with different numbers of alternatives per item was used, with only one correct alternative for each item.

CT-instructions

All participants received a 12 min video-based instruction that started with emphasizing the importance of CT in general, describing the features of CT, and explaining which skills and attitudes are needed to think critically. Thereafter, explicit instructions on how to avoid biases in syllogistic reasoning and conjunction fallacies followed, consisting of two worked examples that showed the correct line of reasoning. The purpose of these explicit instructions was to provide students with knowledge on CT and to allow them to mentally correct initially incorrect responses on the items seen in the pretest.

CT-practice

Participants performed practice activities on the task categories that they were given instructions on (i.e., syllogistic reasoning and conjunction tasks). The CT-practice consisted of four practice tasks, two of each of the task categories. Each practice task was again adapted to the study domain and started with the problem statement (see Online Appendix B for an example of a practice task of each condition). Participants in the correct examples condition were provided with a fictitious student’s correct solution and explanation to the problem, including auxiliary representations, and were prompted to explain why the solution steps were correct. Participants in the erroneous examples condition received a fictitious student’s erroneous solution to the problem, again including auxiliary representations. They were prompted to indicate the erroneous solution step and to provide the correct solution themselves. In the contrasting examples , participants were provided fictitious students’ correct and erroneous solutions to the problem and were prompted to compare the two solutions and to indicate the erroneous solution and the erroneous solution step. Participants in the practice problems condition had to solve the problems themselves, that is, they were instructed to choose the best answer option and were asked to explain how the answer was obtained. Participants in all conditions were asked to read the practice tasks thoroughly. To minimize differences in time investment (i.e., the contrasting examples consisted of considerably more text), we have added self-explanation prompts in the correct examples, erroneous examples, and practice problem conditions.

Mental effort

After each test item and practice-task, participants were asked to report how much effort they invested in completing that task or item on a 9-point subjective rating scale ranging from (1) very, very low effort to (9) very, very high effort (Paas, 1992 ). This widely used scale in educational research (for overviews, see Paas et al., 2003b ; Van Gog & Paas, 2008 ), is assumed to reflect the cognitive capacity actually allocated to accommodate the demands imposed by the task or item (Paas et al., 2003a ).

The study was run during the first two lessons of a mandatory first-year CT-course in two, very similar, Security and Governance study programs. Participants were not given CT-instructions in between these lessons. They completed the study in a computer classroom at the participants’ university with an entire class of students, their teacher, and the experiment leader (first author) present. When entering the classroom, participants were instructed to sit down at one of the desks and read an A4-paper containing some general instructions and a link to the computer-based environment (Qualtrics platform). The first experimental session (ca. 90 min) began with obtaining written consent from all participants. Then, participants filled out a demographic questionnaire and completed the pretest. Next, participants entered the practice phase in which they first viewed the video-based CT-instructions and then were assigned to one of the four practice conditions. Immediately after the practice phase, participants completed the immediate posttest. Approximately 3 weeks later, participants took the delayed posttest (ca. 20 min) in their computer classrooms. Additionally, students of the Safety and Security Management program took the 9-month delayed posttest during the first mandatory CT-lesson of their second study year, Footnote 2 which was exactly the same as the 3-week delayed posttest. During all experimental sessions, participants could work at their own pace and were allowed to use scrap paper. Time-on-task was logged during all phase and participants had to indicate after each test item and practice-task how much effort they invested. Participants had to wait (in silence) until the last participants had finished before they were allowed to leave the classroom.

Data analysis

All test items were MC-only questions, except for one learning item and one transfer items with only two alternatives (conjunction item and base-rate item) that were MC-plus-motivation questions to prevent participants from guessing. Items were scored for accuracy, that is, unbiased reasoning; 1 point for each correct alternative on the MC-only questions or a maximum of 1 point (increasing in steps of 0.5) for the correct explanation for the MC-plus-motivation question using a coding scheme that can be found on our OSF-page. Because two transfer items (i.e., one Wason selection item and one base-rate item) appeared to substantially reduce the reliability of the transfer performance measure, presumably as a result of low variance due to floor effects, we decided to omit these items from our analyses. As a result, participants could attain a maximum total score of 6 on the learning items and a maximum score of 4 on the transfer items. For comparability, learning and transfer outcomes were computed as percentage correct scores instead of total scores. Participants’ explanations on the open questions of the tests were coded by one rater and another rater (the first author) coded 25% of the explanations of the immediate posttest. Intra-class correlation coefficients were 0.990 for the learning test items and 0.957 for the transfer test items. After the discrepancies were resolved by discussion, the primary rater’s codes were used in the analyses.

Cronbach’s alpha on invested mental effort ratings during studying correct examples, studying erroneous examples, contrasting examples, and solving practice problems, respectively, was .87, .76, .77, and .65. Cronbach’s alpha on the learning items was .21, .42, .58, and .31 on the pretest, immediate posttest, 3-week delayed posttest, and 9-month delayed posttest, respectively. The low reliability on the pretest might be explained by the fact that a lack of prior knowledge requires guessing of answers. As such, inter-item correlations are low, resulting in a low Cronbach’s alpha. Cronbach’s alpha on the transfer items was .31, .12, and .29 on the immediate, 3-week delayed, and 9-month delayed posttest, respectively. Cronbach’s alpha on the mental effort items belonging to the learning items was .73, .79, .81, and .76 on the pretest, immediate posttest, 3-week delayed posttest, and 9-month delayed posttest, respectively. Cronbach’s alpha on the mental effort items belonging to the transfer items was .71, .75, and .64 on the immediate posttest, 3-week delayed posttest, and 9-month delayed posttest, respectively. However, caution is required in interpreting the above values because sample sizes as in studies like this do not seem to produce sufficiently precise alpha coefficients (e.g. Charter, 2003 ). Cronbach’s alpha is a statistic and therefore subject to sample fluctuations. Hence, one should be careful with drawing firm conclusions about the precision of Cronbach’s alpha in the population (the parameter) based on small sample sizes (i.e., in reliability literature, samples of 300–400 are considered small, see for instance Charter, 2003 ; Nunally & Bernstein, 1994 ; Segall, 1994 ).

There was no significant difference on pretest performance between participants who stayed in the study and those who dropped out after the first session, t (172) = .38, p  = .706, and those who dropped out after the second session, t (172) = − 1.46, p  = .146. Furthermore, there was no significant difference in educational background between participants who stayed in the study and those who dropped out after the first session, r (172) = .13, p  = .087, and those who dropped out after the second session, r (172) = − .01, p  = .860. Finally, there was no significant difference in age between participants who stayed in the study and those who dropped out after the first session, t (172) = − 1.51, p  = .134, but there was a difference between participants who stayed in the study and those who dropped out after the second session, t (172) = − 2.02, p  = .045. However, age did not correlate significantly with learning performance (minimum p  = .553) and was therefore not a confounding variable.

Additionally, participants’ performance during the practice phase was scored for accuracy, that is, unbiased reasoning. In each condition, participants could attain a maximum score of 2 points (increasing in steps of 0.5) for the correct answer on each problem (either MC-only answers or MC-plus-explanation answers), resulting in a maximum total score of 8. The explanations given during practice were coded for explicit relations to the principles that were communicated in the instructions (i.e., principle-based explanations; Renkl, 2014 ). For instance, participants earned the full 2 points if they explained in a conjunction task that the first statement is part of the second statement and that the first statement therefore can never be more likely than the two statements combined. Participants’ explanations were coded by the first author and another rater independently coded 25% of the explanations. Intra-class correlation coefficients were 0.941, 0.946, and 0.977 for performance in the correct examples, erroneous examples, and practice problems conditions respectively (contrasting examples consisted of MC-only questions). After a discussion between the raters about the discrepancies, the primary rater’s codes were updated and used in the exploratory analyses.

For all analyses in this paper, a p -value of .05 was used a threshold for statistical significance. Partial eta-squared (η p 2 ) is reported as an effect size for all ANOVAs (see Table 3 ) with η p 2  = .01, η p 2  = .06, and η p 2  = .14 denoting small, medium, and large effects, respectively (Cohen, 1988 ). Cramer’s V is reported as an effect size for chi-square tests with (having 2 degrees of freedom) V  = .07, V  = .21, and V  = .35 denoting small, medium, and large effects, respectively.

Preliminary analyses

Check on condition equivalence.

Before running any of the main analyses, we checked our conditions on equivalence. Preliminary analyses confirmed that there were no a-priori differences between the conditions in educational background, χ 2 (15) = 15.57, p  = .411, V  = .18; gender, χ 2 (3) = 1.21, p  = .750, V  = .08; performance on the pretest, F (3, 165) = 0.42, p  = .739, η p 2  = .01; time spent on the pretest, F (3, 165) = 0.16, p  = .926, η 2  < .01; and mental effort invested on the pretest, F (3, 165) = 0.80, p  = .498, η 2  = .01. Further, we estimated two multiple regression models (learning and transfer) with practice type and performance on the pretest as explanatory variables, including the interaction between practice type and performance on the pretest. There was no evidence of an interaction effect (learning: R 2  = .07, F (1, 166) = .296, p  = .587; transfer: R 2  = .07, F (1, 166) = .260, p  = .611) and we can, therefore, conclude that the relationship between practice type and performance on the posttest does not depend on performance on the pretest.

Check on time-on-task

The Levene’s test for equality of variances was significant, F (3, 166) = 9.57, p  < .001. Therefore, a Brown–Forsythe one-way ANOVA was conducted. This analysis revealed a significant time-on-task (in seconds) difference between the conditions during practice, F (3, 120.28) = 16.19, p  < .001, η 2  = .22. Pairwise comparisons showed that time-on-task was comparable between erroneous examples ( M  = 862.79, SD  = 422.43) and correct examples ( M  = 839.58, SD  = 298.33) and between contrasting examples ( M  = 512.29, SD  = 130.21) and practice problems ( M  = 500.41, SD  = 130.21). However, time-on-task was significantly higher in the first two conditions compared to the latter two conditions (erroneous examples = correct examples > contrasting examples = practice problems), all p ’s < .001. This should be considered when interpreting the results on effort and posttest performance.

Main analyses

Descriptive and test statistics are presented in Table 2 , 3 , and 4 . Correlations between several variables are presented in Table 5 . It is important to realize that we measured mental effort as an indicator of overall experienced cognitive load. It is known, though, that the relation with learning depends on the origin of the experienced cognitive load. That is, if it originates mainly from germane processes that contribute to learning, high load would positively correlate with test performance, if it originates from extraneous processes, it would negatively correlate with test performance. Caution is warranted in interpreting these correlations, however, because of the exploratory nature of these correlation analyses, which makes it impossible to control for the probability of type 1 errors. We also exploratively analyzed invested mental effort and time-on-task data on the posttest; however, these analyses did not have much added value for this paper and, therefore, are not reported here but will be provided on our OSF-project page.

Performance during the practice phase

As each condition received different prompts during practice, performance during the practice phase could not be meaningfully compared between conditions and, therefore, we decided to report descriptive statistics only to describe the level of performance during the practice phase per condition (see Table 2 ). Descriptive statistics showed that participants earned more than half of the maximum total score while studying correct examples or engaging in contrasting examples. Participants who studied erroneous examples or solved practice problems performed worse during practice.

Mental effort during learning

A one-way ANOVA revealed a significant main effect of Practice Type on mental effort invested in the practice tasks. Contrary to hypothesis 1, a Tukey post hoc test revealed that participants who solved practice problems invested significantly less effort ( M  = 4.28, SD  = 1.11) than participants who engaged in contrasting examples ( M  = 5.08, SD  = 1.29, p  = .022) or studied erroneous examples ( M  = 5.17, SD  = 1.19, p  = .008). There were no other significant differences in effort investment between conditions. Interestingly, invested mental effort during contrasting examples correlated negatively with pretest to posttest performance gains on learning items, indicating that the experienced load originated mainly from extraneous processes (see Table 5 ).

Test performance

The data on learning items were analyzed with two 2 × 4 mixed ANOVAs with Test Moment (pretest and immediate posttest/immediate posttest and 3-week delayed posttest) as within-subjects factor and Practice Type (correct examples, erroneous examples, contrasting examples, and practice problems) as between-subjects factor. Because transfer items were not included in the pretest, the data on transfer items were analyzed by a 2 × 4 mixed ANOVA with Test Moment (immediate posttest and 3-week delayed posttest) as within-subjects factor and Practice Type (correct examples, erroneous examples, contrasting examples, and practice problems) as between-subjects factor.

Performance on learning items

In line with Hypothesis 2, the pretest-immediate posttest analysis showed a main effect of Test Moment on performance on learning items: participants’ performance improved from pretest ( M  = 27.26, SE  = 1.43) to immediate posttest ( M  = 49.98, SE  = 1.87). In contrast to Hypothesis 3, the results did not reveal a main effect of Practice Type, nor an interaction between Practice Type and Test Moment. The second analysis ( N  = 154)—to test whether effects are still present after 3 weeks—showed a main effect of Test Moment: participants performed better on the delayed posttest ( M  = 55.54, SE  = 2.16) compared to the immediate posttest ( M  = 50.95, SE  = 2.00). Again, contrary to our hypothesis, there was no main effect of Practice Type, nor an interaction between Practice Type and Test Moment.

Performance on transfer items

The results revealed no main effect of Test Moment. Moreover, in contrast to Hypothesis 4, the results did not reveal a main effect of Practice Type, nor an interaction between Practice Type and Test Moment. Footnote 3

Exploratory analyses

Participants from one of the study programs were tested again after a 9-month delay. Regarding performance on learning items, a 2 × 4 mixed ANOVA with Test Moment (3-week delayed posttest or 9-month delayed posttest) as within-subjects factor and Practice Type (correct examples, erroneous examples, contrasting examples, and practice problems) as between-subjects factor revealed a main effect of Test Moment (see Table 2 ): participants’ performance improved from 3-week delayed posttest ( M  = 53.30, SE  = 2.69) to 9-month delayed posttest ( M  = 63.00, SE  = 2.24). The results did not reveal a main effect of Practice Type, nor an interaction between Practice Type and Test Moment.

Regarding performance on transfer items , a 2 × 4 mixed ANOVA with Test Moment (3-week delayed posttest and 9-month delayed posttest) as within-subjects factor and Practice Type (correct examples, erroneous examples, contrasting examples, and practice problems) as between-subjects factor revealed a main effect of Test Moment (see Table 2 ): participants performed lower on the 3-week delayed test ( M  = 19.25, SE  = 1.60) than the 9-month delayed test ( M  = 24.84, SE  = 1.67). The results did not reveal a main effect of Practice Type, nor an interaction between Practice Type and Test Moment.

Previous research has demonstrated that providing students with explicit instructions combined with practice on domain-relevant tasks was beneficial for learning to reason in an unbiased manner (Heijltjes et al., 2014a , 2014b , 2015 ), and that practice consisting of worked example study was more effective for novices’ learning than practice problem solving (Van Peppen et al., 2021 ). However, this was not sufficient to establish transfer to novel tasks. With the present study, we aimed to find out whether contrasting examples—which has been proven effective for promoting transfer in other learning domains—would promote learning and transfer of reasoning skills.

Findings and implications

Our results corroborate the finding of previous studies (e.g., Heijltjes et al., 2015 ; Van Peppen et al., 2018 , 2021 ) that providing students with explicit instructions and practice activities is effective for learning to avoid biased reasoning (Hypothesis 1), since we found considerable pretest to immediate posttest gains on practiced items. Moreover, our results revealed that participants’ performance improved even further after a 3-week and a 9-month delay, although the latter finding could also be attributed to the further instructions that were given in courses in-between the 3-week and 9-month follow up. That students improved in the longer term seems to indicate that our instructional intervention triggered active and deep processing and contributed to storage strength. Hence, our findings provide further evidence that a relatively brief instructional intervention including explicit instructions and practice opportunities is effective for learning of CT-skills, which is promising for educational practice.

In contrast to our expectations, however, we did not find any differences among conditions on either learning or transfer (Hypothesis 3). It is surprising that the present study did not reveal a beneficial effect of studying correct examples as opposed to practicing with problems, as this worked example effect has been demonstrated with many different tasks (Renkl, 2014 ; Van Gog et al., 2019 ), including heuristics-and-biases tasks (Van Peppen et al., 2021 ).

Given that most studies on the worked example effect use pure practice conditions or give minimal instructions prior to practice (e.g., Van Gog et al., 2019 ), whereas the current study was preceded by instructions including two worked examples, one might wonder whether this contributed to the lack of effect. That is, the effects are usually not investigated in a context in which elaborate processing of instructions precedes practice, as in the current (classroom) study, and this may have affected the results. It seems possible that the CT-instructions already had a substantial effect on learning unbiased reasoning, making it difficult to find differential effects of different types of practice activities. This suggestion, however, contradicts the relatively low performance during the practice phase. Moreover, one could argue that if these instructions would lead to higher prior knowledge, it should render the correct worked examples less useful (cf. research on the ‘expertise reversal effect’) and should help those in the other practice conditions perform better on the practice problems, but we did not find that either. Furthermore, these instructions were also provided in a previous study in which a worked example effect was found in two experiments (Van Peppen et al., 2021 ). A major difference between that prior study and this one, however, is that in the present study, participants were prompted to self-explain while studying examples or solving practice problems. Prompting self-explanations, however, seems to encourage students to engage in deep processing during learning (Chi et al., 1994 ), especially for students with sufficient prior knowledge (Renkl & Atkinson, 2010 ). In the present study, this might have interfered with the usual worked-example effect. However, the quality of the self-explanations was higher in the correct example condition than in the problem-solving condition (i.e., performance during the practice phase scores), making the absence of a worked example effect even more remarkable. Given that the worked example effect mainly occurs for novices, one could argue that participants in the current study had more prior knowledge than participants in that prior study; however, it concerned a similar group of students and descriptive statistics showed that students performed comparable on average in both studies.

Another potential explanation might lie in the number of practice tasks, which differed between the prior study (nine tasks: Van Peppen et al., 2021 ) and present study (four tasks), and which might moderate the effect of worked examples. The mean scores on the pretests as well as the performance progress in the practice problem condition was comparable with the previous study, but the progress of the worked example condition was considerably smaller. As it is crucial for a worked example effect that the worked-out solution procedures are understood, it might be that the effect did not emerge in the present study because participants did not get sufficient worked examples during practice.

This might perhaps also explain why contrasting examples did not benefit learning or transfer in the present study. Possibly, students first need to gain a better understanding of the subject matter with heuristics-and-biases tasks before they are able to benefit from aligning the examples (Rittle-Johnson et al., 2009 ). In particular the lack of transfer effects might be related to the duration or extensiveness of the practice activities; even though students learned to solve reasoning tasks, their subject knowledge may have been insufficient to solve novel tasks. As such, it can be argued that establishing transfer needs longer or more extensive practice. Contrasting examples seem to help students extend and refine their knowledge and skills through engaging in comparing activities and analyzing errors, that is, they seem to help them to correctly update schemas of correct concepts and strategies and to create schemas for erroneous strategies reducing the probability of recurring erroneous solutions in the future. However, more attention may need to be paid to the acquisition of the new knowledge and integration with wat students already know (see the Dimensions of Learning framework; Marzano et al., 1993 ). Potentially, having contrasting examples preceded by a more extensive instruction phase to guarantee a better understanding of logical and statistical reasoning would enhance learning and establish transfer. Another possibility would be to provide more guidance in the contrasting examples, as has been done in previous studies by explicitly marking the erroneous examples as incorrect and prompting students to reflect or elaborate on the examples (e.g., Durkin & Rittle-Johnson, 2012 ; Loibl & Leuders, 2018 , 2019 ). It should be noted though, that the lower time on task in the contrasting condition might also be indicative of a motivational problem; whereas the side-by-side presentation was intended to encourage deep processing, it might have had the opposite effect that students might have engaged in superficial processing, just scanning to see where differences in the examples lay, without thinking much about the underlying principles. This idea is confirmed by the finding that invested mental effort during comparing correct and erroneous examples correlated negatively with performance gains on learning items, indicating that the experienced load originated mainly from extraneous processes. It would be interesting in future research to manipulate knowledge gained during instruction to investigate whether prior knowledge indeed moderates the effect of contrasting examples and to examine the interplay between contrasting examples, reflection/elaboration prompts, and final test performance.

Another possible explanation for the lack of a beneficial effect of contrasting examples might be related to the self-explanations prompts that were provided in the correct examples, erroneous examples, and practice problems conditions. Although the prompts differ, it is important to note that the explicit instruction to compare the solution process likely evokes self-explaining processes as well. The reason we added self-explanation prompts to the other conditions was to rule out an effect of prompting as such, as well as a potential effect of time on task (i.e., the text length in the contrasting examples condition was considerably longer than in the other conditions). The positive effect of contrasting examples might have been negated by a positive effect of the self-explanation prompts given in the other conditions. However, had we found a positive effect of comparing, as we expected, our design would have increased the likelihood that this was due to the comparison process and not just to more in-depth processing or higher processing time through self-explaining. Unexpectedly, we did find time-on-task differences between conditions during practice, but this does not seem to affect our findings. Time-on-task during practice was not correlated with learning and transfer posttest performance. This also becomes apparent from the condition means, i.e., the conditions with the lowest time-on-task means did not differ on learning and transfer compared to the conditions with the highest time-on-task means.

The classroom setting might also explain why there were no differential effects of contrasting examples. This study was conducted as part of an existing course and the learning materials were relevant for the course/exam and. Because of that, students’ willingness to invest effort in their performance may have been higher than is generally the case in psychological laboratory studies: their performance on such tasks actually mattered (intrinsically or extrinsically) to them. As such, students in the control conditions may have engaged in generative processing themselves, for instance by trying to compare the given correct (or erroneous) examples with internally represented erroneous (or correct) solutions. Therefore, it is possible that effects of generative processing strategies such as comparing correct and erroneous examples found in the psychological laboratory—where students participate to earn required research credits and the learning materials are not part of their study program—might not readily transfer to field experiments conducted in real classrooms.

The absence of differential effects of the practice activities on learning and transfer may also be related to the affective and attitudinal dimension of CT. Attitudes and perceptions about learning affect learning (Marzano et al., 1993 ), probably even more so in the CT-domain than in other learning domains. Being able to think critically relies heavily on the extent to which one possesses the requisite skills and is able to use these skills, but also on whether one is inclined to use these skills (i.e., thinking dispositions; Perkins et al., 1993 ).

The present study raises further questions about how transfer of CT-skills can be promoted. Although several studies have shown that to enhance transfer of knowledge or skills, instructional strategies should contribute to storage strength by effortful learning conditions that trigger active and deep processing ( desirable difficulties ; e.g., Bjork & Bjork, 2011 ), the present study—once again (Van Peppen et al., 2018 , 2021 ; Heijltjes et al., 2014a , 2014b , 2015 )—showed that this may not apply to transfer of CT-skills. This lack of transfer could lie in inadequate recall of the acquired knowledge, recognition that the acquired knowledge is relevant to the new task, and/or the ability to actually map that knowledge onto the new task (Barnett & Ceci, 2002 ). Following this, a further study should elucidate what the underlying mechanism(s) is/are to shed more light on how to promote transfer of CT-skills.

Limitations and strengths

One limitation of this study is that our measures showed low levels of reliability. Under these circumstances, the probability of detecting a significant effect—given one exists—are low (e.g., Cleary et al., 1970 ; Rogers & Hopkins, 1988 ), and subsequently, the chance that Type 2 errors have occurred in the current study is relatively high. In our study, the low levels of reliability can probably be explained by the multidimensional nature of the CT-test, that is, it represents multiple constructs that do not correlate with each other. Performance on these tasks depends not only on the extent to which that task elicits a bias (resulting from heuristic reasoning), but also on the extent to which a person possesses the requisite mindware (e.g., rules or logic or probability). Thus, systematic variance in performance on such tasks can either be explained by a person’s use of heuristics or his/her available mindware. If it differs per item to what extent a correct answer depends on these two aspects, and if these aspects are not correlated, there may not be a common factor explaining all interrelationships between the measured items. Moreover, the reliability issue may have increased even more since multiple task types were included in the CT-skills tests, requiring different, and perhaps uncorrelated, types of mindware (e.g., rules of logic or probability). Future research, therefore, would need to find ways to improve CT measures (i.e., decrease measurement error), for instance by narrowing down the test into a single measurable construct, or should utilize measures known to have acceptable levels of reliability (LeBel & Paunonen, 2011 ). The latter option seems challenging, however, as multiple studies report rather low levels of reliability of tests consisting of heuristics and biases tasks (Aczel et al., 2015 ; West et al., 2008 ) and revealed concerns with the reliability of widely used standardized CT tests, particularly with regard to subscales (Bernard et al., 2008 ; Bondy et al., 2001 ; Ku, 2009 ; Leppa, 1997 ; Liu et al., 2014 ; Loo & Thorpe, 1999 ). This raises the question whether these issues are related to the general construct CT. To achieve further progress in research on instructional methods for teaching CT, more knowledge on the construct validity of CT in general and unbiased reasoning in particular is needed. When the aim is to evaluate CT as a whole, one should perhaps move towards a more holistic measurement method, for instance by performing pairwise comparisons (i.e., comparative judgment; Bramley & Vitello, 2018 ; Lesterhuis et al., 2017 ). If, however, the intention is to measure specific aspects of CT, one should indicate specifically which aspect of CT to measure and select a suitable test for that aspect. Mainly considering that individual aspects of CT may not be as strongly correlated as thought and then could not be included in one scale.

Another point worth mentioning, is that we opted for assessing invested mental effort, which reflects the amount of cognitive load students experienced. This is informative when combined with their performance (for a more elaborate discussion, see Van Gog & Paas, 2008 ). Moreover, research has shown that it is important to measure cognitive load immediately after each task (e.g., Schmeck et al., 2015 ; Van Gog et al., 2012 ) and the mental effort rating scale (Paas, 1992 ) is easy to apply after each task. However, it unfortunately does not allow us to distinguish between different types of load. It should be noted, though, that it seems very challenging to do this with other measurement instruments (e.g., Skulmowski & Rey, 2017 ). Also, instruments that might be suited for this purpose, for example the rating scale developed by Leppink et al. ( 2013 ), would have been too long to apply after each task in the present study.

A strength of the current study is that it was conducted in a real educational setting as part of an existing CT course, which increases ecological validity. Despite the wealth of worked examples research, classroom studies are relatively rare. Interestingly, (multi-session) classroom studies on math and chemistry have also failed to find the worked example effect, although—in contrast to the present study—worked examples often did show clear efficiency benefits compared to practice problems (McLaren et al., 2016 ; Van Loon-Hillen et al., 2012 ). In line with our finding, a classroom study by Isotani et al. ( 2011 ) indicated that (high prior knowledge) students did not benefit more from studying erroneous examples than from correct examples or practice problems. As discussed earlier in the discussion, the classroom setting might explain the absence of generative processing strategies on learning and transfer. This suggests a theoretical implication, namely that beneficial effects of such strategies might become smaller when the willingness to invest increases and vice versa.

To conclude, based on the findings of the present study, comparing correct and erroneous examples (i.e., contrasting examples) does not seem to be a promising instructional method to further enhance learning and transfer of specific—and specifically tested—CT skills. Consequently, our findings raise questions about the preconditions of contrasting examples effects and effects of generative processing strategies in general, such as the setting in which they are presented to students. Further research on the exact boundary conditions, through solid laboratory and classroom studies, is therefore recommended. Moreover, this study provides valuable insights for educational practice. That is, providing students with explicit CT-instruction and the opportunity to practice with domain-relevant problems in a relatively short instructional intervention has the potential to improve learning. The format of the practice tasks does not seem to matter much, although a prior study did find a benefit of studying correct examples, which might therefore be the safest bet. Finally, this study again underlines the great difficulty of designing instructions to enhance CT-skills in such a way that these would also transfer across tasks/domains.

Data availability

All data, script files, and materials are provided on the Open Science Framework (OSF) project page that we created for this study (anonymized view-only link: https://osf.io/8zve4/?view_only=ca500b3aeab5406290310de34323457b ).

Code availability

Not applicable.

This study investigated effects of interleaved practice (as opposed to blocked practice) on students’ learning and transfer of unbiased reasoning. Given that interleaved practice seems to impose high cognitive load, which may hinder learning, it was additionally tested whether this effect interacts with the format of the practice tasks (i.e., correct examples or practice problems).

We had no access to another CT-lesson of the Public Administration program, so due to practical reasons, students of this program were not administered to the 9-month delayed posttest.

We also exploratively analyzed the learning and transfer data for each individual measurement point and we analyzed performance on single learning and transfer items. The outcomes did not deviate markedly from the findings on sum scores (i.e., no effects of Practice Type were found). Test statistics can be found on our OSF-project page and the descriptive statistics of performance per single item can be found in Table 4 .

Abrami, P. C., Bernard, R. M., Borokhovski, E., Wade, A., Surkes, M. A., Tamim, R., & Zhang, D. (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 78 , 1102–1134. https://doi.org/10.3102/0034654308326084

Article   Google Scholar  

Abrami, P. C., Bernard, R. M., Borokhovski, E., Waddington, D. I., Wade, C. A., & Persson, T. (2014). Strategies for teaching students to think critically: A meta-analysis. Review of Educational Research, 85 , 275–314. https://doi.org/10.3102/0034654314551063

Aczel, B., Bago, B., Szollosi, A., Foldes, A., & Lukacs, B. (2015). Measuring individual differences in decision biases: Methodological considerations. Frontiers in Psychology . https://doi.org/10.3389/fpsyg.2015.01770

Adams, D. M., McLaren, B. M., Durkin, K., Mayer, R. E., Rittle-Johnson, B., Isotani, S., & Van Velsen, M. (2014). Using erroneous examples to improve mathematics learning with a web-based tutoring system. Computers in Human Behavior, 36 , 401–411. https://doi.org/10.1016/j.chb.2014.03.053

Angeli, C., & Valanides, N. (2009). Instructional effects on critical thinking: Performance on ill-defined issues. Learning and Instruction, 19 , 322–334. https://doi.org/10.1016/j.learninstruc.2008.06.010

Arum, R., & Roksa, J. (2011). Limited learning on college campuses. Society, 48 , 203–207. https://doi.org/10.1007/s12115-011-9417-8

Atkinson, R. K., Renkl, A., & Merrill, M. M. (2003). Transitioning from studying examples to solving problems: Effects of self-explanation prompts and fading worked-out steps. Journal of Educational Psychology, 95 , 774–783. https://doi.org/10.1037/0022-0663.95.4.774

Barbieri, C., & Booth, J. L. (2016). Support for struggling students in algebra: Contributions of incorrect worked examples. Learning and Individual Differences, 48 , 36–44. https://doi.org/10.1016/j.lindif.2016.04.001

Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn?: A taxonomy for far transfer. Psychological Bulletin, 128 , 612–636. https://doi.org/10.1037/0033-2909.128.4.612

Beaulac, G. & Kenyon, T. (2014). Critical thinking education and debiasing. Informal Logic, 34 , 341–363. https://doi.org/10.22329/il.v34i4.4203

Bernard, R. M., Zhang, D., Abrami, P. C., Sicoly, F., Borokhovski, E., & Surkes, M. A. (2008). Exploring the structure of the Watson-Glaser Critical Thinking Appraisal: One scale or many subscales? Thinking Skills and Creativity, 3 , 15–22. https://doi.org/10.1016/j.tsc.2007.11.001

Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 59–68). Worth Publishers.

Google Scholar  

Bondy, K. N., Koenigseder, L. A., Ishee, J. H., & Williams, B. G. (2001). Psychometric properties of the California Critical Thinking Tests. Journal of Nursing Measurement, 9 , 309–328. https://doi.org/10.1891/1061-3749.9.3.309

Booth, J. L., Lange, K. E., Koedinger, K. R., & Newton, K. J. (2013). Using example problems to improve student learning in algebra: Differentiating between correct and incorrect examples. Learning and Instruction, 25 , 24–34. https://doi.org/10.1016/j.learninstruc.2009.10.001

Booth, J. L., Oyer, M. H., Paré-Blagoev, E. J., Elliot, A. J., Barbieri, C., Augustine, A., & Koedinger, K. R. (2015). Learning algebra by example in real-world classrooms. Journal of Research on Educational Effectiveness, 8 , 530–551. https://doi.org/10.1080/19345747.2015.1055636

Bramley, T., & Vitello, S. (2018). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 2018 , 1–16. https://doi.org/10.1080/0969594X.2017.1418734

Charter, R. A. (2003). Study samples are too small to produce sufficiently precise reliability coefficients. The Journal of General Psychology, 130 , 117–129.

Chi, M. T. H., de Leeuw, N., Chiu, M., & LaVancher, C. (1994). Eliciting self-explanation improves understanding. Cognitive Science, 18 , 439–477. https://doi.org/10.1207/s15516709cog1803_3

Cleary, T. A., Linn, R. L., & Walster, G. W. (1970). Effect of reliability and validity on power of statistical tests. Sociological Methodology, 2 , 130–138. https://doi.org/10.1037/a0031026

Cohen, J (1988). Statistical power analysis for the behavioral sciences (2nd. ed., reprint). Psychology Press.

Davies, M. (2013). Critical thinking and the disciplines reconsidered. Higher Education Research & Development, 32 , 529–544. https://doi.org/10.1080/07294360.2012.697878

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14 , 4–58. https://doi.org/10.1177/1529100612453266

Durkin, K., & Rittle-Johnson, B. (2012). The effectiveness of using incorrect examples to support learning about decimal magnitude. Learning and Instruction, 22 , 206–214. https://doi.org/10.1016/j.learninstruc.2011.11.001

Ennis, R. H. (1989). Critical thinking and subject specificity: Clarification and needed research. Educational Researcher, 18 , 4–10. https://doi.org/10.3102/0013189X018003004

Evans, J. S. B. (2002). Logic and human reasoning: An assessment of the deduction paradigm. Psychological Bulletin, 128 , 978–996. https://doi.org/10.1037/0033-2909.128.6.978

Evans, J. S. B. (2003). In two minds: Dual-process accounts of reasoning. Trends in Cognitive Sciences, 7 , 454–459. https://doi.org/10.1016/j.tics.2003.08.012

Evans, J. S. B., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11 , 295–306. https://doi.org/10.3758/BF03196976

Facione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction . The California Academic Press.

Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41 , 1149–1160. https://doi.org/10.3758/BRM.41.4.1149 .

Flores, K. L., Matkin, G. S., Burbach, M. E., Quinn, C. E., & Harding, H. (2012). Deficient critical thinking skills among college graduates: Implications for leadership. Educational Philosophy and Theory, 44 , 212–230. https://doi.org/10.1111/j.1469-5812.2010.00672.x

Fong, G. T., Krantz, D. H., & Nisbett, R. E. (1986). The effects of statistical training on thinking about everyday problems. Cognitive Psychology, 18 , 253–292. https://doi.org/10.1016/0010-0285(86)90001-0

Ginns, P. (2006). Integrating information: A meta-analysis of the spatial contiguity and temporal contiguity effects. Learning and Instruction, 16 , 511–525. https://doi.org/10.1016/j.learninstruc.2006.10.001

Grabowski, B. (1996). Generative learning. Past, present, and future. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (pp. 897–918). Macimillian Library Reference.

Große, C. S., & Renkl, A. (2007). Finding and fixing errors in worked examples: Can this foster learning outcomes? Learning and Instruction, 17 , 612–634. https://doi.org/10.1016/j.learninstruc.2007.09.008

Halpern, D. F. (2014). Critical thinking across the curriculum: A brief edition of thought & knowledge . Routledge.

Book   Google Scholar  

Heijltjes, A., Van Gog, T., Leppink, J., & Paas, F. (2014a). Improving critical thinking: Effects of dispositions and instructions on economics students’ reasoning skills. Learning and Instruction, 29 , 31–42. https://doi.org/10.1016/j.learninstruc.2013.07.003

Heijltjes, A., Van Gog, T., & Paas, F. (2014b). Improving students’ critical thinking: Empirical support for explicit instructions combined with practice. Applied Cognitive Psychology, 28 , 518–530. https://doi.org/10.1002/acp.3025

Heijltjes, A., Van Gog, T., Leppink, J., & Paas, F. (2015). Unraveling the effects of critical thinking instructions, practice, and self-explanation on students’ reasoning performance. Instructional Science, 43 , 487–506. https://doi.org/10.1002/acp.3025

Huber, C. R., & Kuncel, N. R. (2016). Does college teach critical thinking? A meta-analysis. Review of Educational Research, 86 , 431–468. https://doi.org/10.3102/0034654315605917

Ibiapina, C., Mamede, S., Moura, A., Elói-Santos, S., & van Gog, T. (2014). Effects of free, cued and modelled reflection on medical students’ diagnostic competence. Medical Education, 48 , 796–805. https://doi.org/10.1111/medu.12435

Isotani, S., Adams, D., Mayer, R. E., Durkin, K., Rittle-Johnson, B., & McLaren, B. M. (2011). Can erroneous examples help middle-school students learn decimals? In Proceedings of the sixth European conference on technology enhanced learning: Towards ubiquitous learning (EC-TEL-2011) .

Kalyuga, S. (2011). Cognitive load theory: How many types of load does it really need? Educational Psychology Review, 23 , 1–19. https://doi.org/10.1007/s10648-010-9150-7

Kassin, S. M., Dror, I. E., & Kukucka, J. (2013). The forensic confirmation bias: Problems, perspectives, and proposed solutions. Journal of Applied Research in Memory and Cognition, 2 , 42–52. https://doi.org/10.1016/j.jarmac.2013.01.001 .

Kawasaki, M. (2010). Learning to solve mathematics problems: The impact of incorrect solutions in fifth grade peers’ presentations. Japanese Journal of Developmental Psychology, 21 , 12–22.

Ku, K. Y. L. (2009). Assessing students’ critical thinking performance: Urging for measurements using multi-response format. Thinking Skills and Creativity, 4 , 70–76. https://doi.org/10.1016/j.tsc.2009.02.001

Lai, E. R. (2011). Critical thinking: A literature review. Pearson’s Research Reports, 6 , 40–41.

LeBel, E. P., & Paunonen, S. V. (2011). Sexy but often unreliable: The impact of unreliability on the replicability of experimental findings with implicit measures. Personality and Social Psychology Bulletin, 37 , 570–583. https://doi.org/10.1177/0146167211400619

Leppa, C. J. (1997). Standardized measures of critical thinking: Experience with the California Critical Thinking Tests. Nurse Education, 22 , 29–33.

Leppink, J., Paas, F., Van der Vleuten, C. P., Van Gog, T., & Van Merriënboer, J. J. (2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45 , 1058–1072. https://doi.org/10.3758/s13428-013-0334-1

Lesterhuis, M., Verhavert, S., Coertjens, L., Donche, V., & De Mayer, S. (2017). Comparative judgement as a promising alternative to score competences. In E. Cano & G. Ion (Eds.), Innovative practices for higher education assessment and measurement (pp. 119–138). IGI Global. https://doi.org/10.4018/978-1-5225-0531-0.ch007

Chapter   Google Scholar  

Liu, O. L., Frankel, L., & Roohr, K. C. (2014). Assessing critical thinking in higher education: Current state and directions for next-generation assessment. ETS Research Report Series, 2014 , 1–23. https://doi.org/10.1002/ets2.12009

Loibl, K., & Leuders, T. (2018). Errors during exploration and consolidation—The effectiveness of productive failure as sequentially guided discovery learning. Journal für Mathematik-Didaktik, 39 , 69–96. https://doi.org/10.1007/s13138-018-0130-7

Loibl, K., & Leuders, T. (2019). How to make failure productive: Fostering learning from errors through elaboration prompts. Learning and Instruction, 62 , 1–10. https://doi.org/10.1016/j.learninstruc.2019.03.002

Loo, R., & Thorpe, K. (1999). A psychometric investigation of scores on the Watson-Glaser critical thinking appraisal new forms. Educational and Psychological Measurement, 59 , 995–1003. https://doi.org/10.1177/00131649921970305

Markovits, H., & Nantel, G. (1989). The belief-bias effect in the production and evaluation of logical conclusions. Memory & Cognition, 17 , 11–17. https://doi.org/10.3758/BF03199552

Marzano, R. J., Pickering, D., & McTighe, J. (1993). Assessing student outcomes: Performance assessment using the Dimensions of Learning Model . Association for Supervision and Curriculum Development.

McLaren, B. M., Adams, D. M., & Mayer, R. E. (2015). Delayed learning effects with erroneous examples: A study of learning decimals with a web-based tutor. International Journal of Artificial Intelligence in Education, 25 , 520–542. https://doi.org/10.1007/s40593-015-0064-x

McLaren, B. M., Van Gog, T., Ganoe, C., Karabinos, M., & Yaron, D. (2016). The efficiency of worked examples compared to erroneous examples, tutored problem solving, and problem solving in computer-based learning environments. Computers in Human Behavior, 55 , 87–99. https://doi.org/10.1016/j.chb.2015.08.038

Moore, T. (2004). The critical thinking debate: How general are general thinking skills? Higher Education Research & Development, 23 , 3–18. https://doi.org/10.1080/0729436032000168469

Newstead, S. E., Pollard, P., Evans, J. St. B. T., & Allen, J. L. (1992). The source of belief bias effects in syllogistic reasoning. Cognition, 45 , 257–284. https://doi.org/10.1016/0010-0277(92)90019-E

Nievelstein, F., Van Gog, T., Van Dijck, G., & Boshuizen, H. P. (2013). The worked example and expertise reversal effect in less structured tasks: Learning to reason about legal cases. Contemporary Educational Psychology, 38 , 118–125. https://doi.org/10.1007/s11251-008-9076-3

Niu, L., Behar-Horenstein, L. S., & Garvan, C. W. (2013). Do instructional interventions influence college students’ critical thinking skills? A meta-analysis. Educational Research Review, 9 , 114–128. https://doi.org/10.1016/j.edurev.2012.12.002

Nunally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Osborne, R. J., & Wittrock, M. C. (1983). Learning science: A generative process. Science Education, 67 , 489–508. https://doi.org/10.1002/sce.3730670406

Paas, F. (1992). Training strategies for attaining transfer or problem solving skills in statistics: A cognitive-load approach. Journal of Educational Psychology, 84 , 429–434. https://doi.org/10.1037/0022-0663.84.4.429

Paas, F., Renkl, A., & Sweller, J. (2003a). Cognitive load theory and instructional design: Recent developments. Educational Psychologist, 38 , 1–4. https://doi.org/10.1207/S15326985EP3801_1

Paas, F., Tuovinen, J. E., Tabbers, H., & Van Gerven, P. W. (2003b). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38 , 63–71. https://doi.org/10.1207/S15326985EP3801_8

Pascarella, E. T., Blaich, C., Martin, G. L., & Hanson, J. M. (2011). How robust are the findings of Academically Adrift? Change: The Magazine of Higher Learning, 43 , 20–24. https://doi.org/10.1080/00091383.2011.568898

Perkins, D. N., & Salomon, G. (1992). Transfer of learning. In T. Husen & T. N. Postelwhite (Eds.), The international encyclopedia of educational (2nd ed., Vol. 11, pp. 6452–6457). Pergamon Press.

Perkins, D. N., Jay, E., & Tishman, S. (1993). Beyond abilities: A dispositional theory of thinking. MerrillPalmer Quarterly, 39 , 1–21.

Renkl, A. (1999). Learning mathematics from worked-out examples: Analyzing and fostering self-explanations. European Journal of Psychology of Education, 14 , 477–488. https://doi.org/10.1007/BF03172974

Renkl, A. (2014). Toward an instructionally oriented theory of example-based learning. Cognitive Science, 38 , 1–37. https://doi.org/10.1111/cogs.12086

Renkl, A., & Atkinson, R. K. (2010). Learning from worked-out examples and problem solving. In J. Plass, R. Moreno, & R. Brünken (Eds.), Cognitive load theory and research in educational psychology (pp. 89–108). Cambridge University Press.

Renkl, A., & Eitel, A. (2019). Self-explaining: Learning about principles and their application. In J. Dunlosky & K. Rawson (Eds.), The Cambridge handbook of cognition and education (pp. 528–549). Cambridge University Press.

Renkl, A., Hilbert, T., & Schworm, S. (2009). Example-based learning in heuristic domains: A cognitive load theory account. Educational Psychology Review, 21 , 67–78. https://doi.org/10.1007/s10648-008-9093-4

Ritchhart, R., & Perkins, D. N. (2005). Learning to think: The challenges of teaching thinking. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 775–802). Cambridge University Press.

Rittle-Johnson, B., Star, J. R., & Durkin, K. (2009). The importance of prior knowledge when comparing examples: Influences on conceptual and procedural knowledge of equation solving. Journal of Educational Psychology, 101, 836–852. https://doi.org/10.1037/a0016026 .

Roelle, J., & Berthold, K. (2015). Effects of comparing contrasting cases on learning from subsequent explanations. Cognition and Instruction, 33 , 199–225. https://doi.org/10.1080/07370008.2015.1063636

Rogers, W. T., & Hopkins, K. D. (1988). Power estimates in the presence of a covariate and measurement error. Educational and Psychological Measurement, 48 , 647–656. https://doi.org/10.1177/0013164488483008

Salomon, G., & Perkins, D. N. (1989). Rocky roads to transfer: Rethinking mechanism of a neglected phenomenon. Educational Psychologist, 24 , 113–142. https://doi.org/10.1207/s15326985ep2402_1 .

Schmeck, A., Opfermann, M., Van Gog, T., Paas, F., & Leutner, D. (2015). Measuring cognitive load with subjective rating scales during problem solving: Differences between immediate and delayed ratings. Instructional Science, 43 , 93–114. https://doi.org/10.1007/s11251-014-9328-3

Schworm, S., & Renkl, A. (2007). Learning argumentation skills through the use of prompts for self-explaining examples. Journal of Educational Psychology, 99, 285–296. https://doi.org/10.1037/0022-0663.99.2.285

Segall, D. O. (1994). The reliability of linearly equated tests. Psychometrika, 59 , 361–375.

Siegler, R. S. (2002). Microgenetic studies of self-explanations. In N. Grannot & J. Parziale (Eds.), Microdevelopment: Transition processs in development and learning (pp. 31–58). Cambridge University Press.

Skulmowski, A., & Rey, G. D. (2017). Measuring cognitive load in embodied learning settings. Frontiers in Psychology, 8 , 1191. https://doi.org/10.3389/fpsyg.2017.01191

Stanovich, K. E. (2011). Rationality and the reflective mind . Oxford University Press.

Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23 , 645–665. https://doi.org/10.1017/S0140525X00003435

Stanovich, K. E., West, R. K., & Toplak, M. E. (2016). The rationality quotient: Toward a test of rational thinking . MIT Press.

Stark, R., Kopp, V., & Fischer, M. R. (2011). Case-based learning with worked examples in complex domains: Two experimental studies in undergraduate medical education. Learning and Instruction, 21 , 22–33. https://doi.org/10.1016/j.learninstruc.2009.10.001

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12 , 257–285. https://doi.org/10.1207/s15516709cog1202_4

Sweller, J., Van Merriënboer, J. J., & Paas, F. G. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10 , 251–296.

Sweller, J., Ayres, P., & Kalyuga, S. (Eds.). (2011). Measuring cognitive load. In Cognitive load theory (pp. 71–85). Springer.

Tiruneh, D. T., Verburgh, A., & Elen, J. (2014). Effectiveness of critical thinking instruction in higher education: A systematic review of intervention studies. Higher Education Studies, 4 , 1–17. https://doi.org/10.5539/hes.v4n1p1

Tiruneh, D. T., Weldeslassie, A. G., Kassa, A., Tefera, Z., De Cock, M., & Elen, J. (2016). Systematic design of a learning environment for domain-specific and domain-general critical thinking skills. Educational Technology Research and Development, 64 , 481–505. https://doi.org/10.1007/s11423-015-9417-2

Trauzettel-Klosinski, S., & Dietz, K. (2012). Standardized assessment of reading performance: The new International Reading Speed Texts IReST. Investigative Ophthalmology & Visual Science, 53 , 5452–5461. https://doi.org/10.1167/iovs.11-8284 .

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185 , 1124–1131. https://doi.org/10.1126/science.185.4157.1124

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review , 90, 293–315. https://psycnet.apa.org . https://doi.org/10.1037/0033-295X.90.4.293

Van den Broek, P., & Kendeou, P. (2008). Cognitive processes in comprehension of science texts: The role of co-activation in confronting misconceptions. Applied Cognitive Psychology, 22 , 335–351. https://doi.org/10.1002/acp.1418

Van Gelder, T. V. (2005). Teaching critical thinking: Some lessons from cognitive science. College Teaching, 53 , 41–48. https://doi.org/10.3200/CTCH.53.1.41-48

Van Gog, T., & Paas, F. (2008). Instructional efficiency: Revisiting the original construct in educational research. Educational Psychologist, 43 , 16–26. https://doi.org/10.1080/00461520701756248

Van Gog, T., Paas, F., & Van Merriënboer, J. J. (2004). Process-oriented worked examples: Improving transfer performance through enhanced understanding. Instructional Science, 32 , 83–98. https://doi.org/10.1023/B:TRUC.0000021810.70784.b0

Van Gog, T., Kirschner, F., Kester, L., & Paas, F. (2012). Timing and frequency of mental effort measurement: Evidence in favour of repeated measures. Applied Cognitive Psychology, 26 , 833–839. https://doi.org/10.1002/acp.2883

Van Gog, T., Rummel, N., & Renkl, A. (2019). Learning how to solve problems by studying examples. In J. Dunlosky & K. Rawson (Eds.), The Cambridge handbook of cognition and education (pp. 183–208). Cambridge University Press.

VanLehn, K. (1999). Rule-learning events in the acquisition of a complex skill: An evaluation of cascade. The Journal of the Learning Sciences, 8 , 71–125. https://doi.org/10.1207/s15327809jls0801_3

Van Loon-Hillen, N. H., Van Gog, T., & Brand-Gruwel, S. (2012). Effects of worked examples in a primary school mathematics curriculum. Interactive Learning Environments, 20 , 89–99. https://doi.org/10.1080/10494821003755510

Van Peppen, L. M., Verkoeijen P. P. J. L., Heijltjes, A. E. G., Janssen, E. M., Koopmans, D., & Van Gog, T. (2018). Effects of self-explaining on learning and transfer of critical thinking skills. Frontiers in Education, 3 , 100. https://doi.org/10.3389/feduc.2018.00100 .

Van Peppen, L. M., Verkoeijen, P. P., Kolenbrander, S. V., Heijltjes, A. E., Janssen, E. M., & van Gog, T. (2021). Learning to avoid biased reasoning: Effects of interleaved practice and worked examples. Journal of Cognitive Psychology, 33 , 304–326. https://doi.org/10.1080/20445911.2021.1890092 .

West, R. F., Toplak, M. E., & Stanovich, K. E. (2008). Heuristics and biases as measures of critical thinking: Associations with cognitive ability and thinking dispositions. Journal of Educational Psychology, 100 , 930–941. https://doi.org/10.1037/a0012842

Wittrock, M. C. (1974). Learning as a generative process. Educational Psychologist, 11 , 87–95. https://doi.org/10.1080/00461527409529129

Wittrock, M. C. (1990). Generative processes of comprehension. Educational Psychologist, 24 , 345–376. https://doi.org/10.1207/s15326985ep2404_2

Wittrock, M. C. (1992). Generative learning processes of the brain. Educational Psychologist, 27 , 531–541. https://doi.org/10.1207/s15326985ep2704_8

Wittrock, M. C. (2010). Learning as a generative process. Educational Psychologist, 45 , 40–45. https://doi.org/10.1080/00461520903433554

Download references

Acknowledgements

This research was funded by The Netherlands Organisation for Scientific Research (Project Number 409-15-203). The authors would like to thank Stefan V. Kolenbrander for his help with running this study and Esther Stoop and Marjolein Looijen for their assistance with coding the data.

This research was funded by The Netherlands Organisation for Scientific Research (Project Number 409-15-203).

Author information

Lara M. van Peppen

Present address: Institute of Medical Education Research, Erasmus University Medical Center Rotterdam, Doctor Molewaterplein 40, 3051 GD, Rotterdam, The Netherlands

Authors and Affiliations

Department of Psychology, Education and Child Studies, Erasmus University Rotterdam, Burgemeester Oudlaan 50, 3062 PA, Rotterdam, The Netherlands

Lara M. van Peppen & Peter P. J. L. Verkoeijen

Learning and Innovation Center, Avans University of Applied Sciences, Hogeschoollaan 1, 4818 CR, Breda, The Netherlands

Peter P. J. L. Verkoeijen & Anita E. G. Heijltjes

Department of Education, Utrecht University, Heidelberglaan 1, 3584 CS, Utrecht, The Netherlands

Eva M. Janssen & Tamara van Gog

You can also search for this author in PubMed   Google Scholar

Contributions

LP, PV, AH, and TG contributed to the conception and design of the study. LP and EM prepared the materials. LP collected the data, organized the database, and performed the statistical analyses. LP, PV, and TG interpreted the data. LP wrote the original draft of the manuscript and PV, AH, EM, and TG provided critical revision of the manuscript. All authors read and approved the submitted version of the manuscript.

Corresponding author

Correspondence to Lara M. van Peppen .

Ethics declarations

Conflict of interest.

The authors declare that there is no conflict of interest.

Ethical approval

In accordance with the guidelines of the ethical committee at the Department of Psychology, Education and Child studies, Erasmus University Rotterdam, the study was exempt from ethical approval procedures because the materials and procedures were not invasive. All subjects gave written informed consent prior to participating in this study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 227 kb)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

van Peppen, L.M., Verkoeijen, P.P.J.L., Heijltjes, A.E.G. et al. Enhancing students’ critical thinking skills: is comparing correct and erroneous examples beneficial?. Instr Sci 49 , 747–777 (2021). https://doi.org/10.1007/s11251-021-09559-0

Download citation

Received : 14 February 2020

Accepted : 11 August 2021

Published : 26 September 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s11251-021-09559-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Critical thinking
  • Heuristics and biases
  • Transfer of learning
  • Example-based learning
  • Erroneous examples
  • Contrasting examples
  • Find a journal
  • Publish with us
  • Track your research

OPINION article

Redefining critical thinking: teaching students to think like scientists.

\r\nRodney M. Schmaltz*

  • Department of Psychology, MacEwan University, Edmonton, AB, Canada

From primary to post-secondary school, critical thinking (CT) is an oft cited focus or key competency (e.g., DeAngelo et al., 2009 ; California Department of Education, 2014 ; Alberta Education, 2015 ; Australian Curriculum Assessment and Reporting Authority, n.d. ). Unfortunately, the definition of CT has become so broad that it can encompass nearly anything and everything (e.g., Hatcher, 2000 ; Johnson and Hamby, 2015 ). From discussion of Foucault, critique and the self ( Foucault, 1984 ) to Lawson's (1999) definition of CT as the ability to evaluate claims using psychological science, the term critical thinking has come to refer to an ever-widening range of skills and abilities. We propose that educators need to clearly define CT, and that in addition to teaching CT, a strong focus should be placed on teaching students how to think like scientists. Scientific thinking is the ability to generate, test, and evaluate claims, data, and theories (e.g., Bullock et al., 2009 ; Koerber et al., 2015 ). Simply stated, the basic tenets of scientific thinking provide students with the tools to distinguish good information from bad. Students have access to nearly limitless information, and the skills to understand what is misinformation or a questionable scientific claim is crucially important ( Smith, 2011 ), and these skills may not necessarily be included in the general teaching of critical thinking ( Wright, 2001 ).

This is an issue of more than semantics. While some definitions of CT include key elements of the scientific method (e.g., Lawson, 1999 ; Lawson et al., 2015 ), this emphasis is not consistent across all interpretations of CT ( Huber and Kuncel, 2016 ). In an attempt to provide a comprehensive, detailed definition of CT, the American Philosophical Association (APA), outlined six CT skills, 16 subskills, and 19 dispositions ( Facione, 1990 ). Skills include interpretation, analysis, and inference; dispositions include inquisitiveness and open-mindedness. 1 From our perspective, definitions of CT such as those provided by the APA or operationally defined by researchers in the context of a scholarly article (e.g., Forawi, 2016 ) are not problematic—the authors clearly define what they are referring to as CT. Potential problems arise when educators are using different definitions of CT, or when the banner of CT is applied to nearly any topic or pedagogical activity. Definitions such as those provided by the APA provide a comprehensive framework for understanding the multi-faceted nature of CT, however the definition is complex and may be difficult to work with at a policy level for educators, especially those who work primarily with younger students.

The need to develop scientific thinking skills is evident in studies showing that 55% of undergraduate students believe that a full moon causes people to behave oddly, and an estimated 67% of students believe creatures such as Bigfoot and Chupacabra exist, despite the lack of scientific evidence supporting these claims ( Lobato et al., 2014 ). Additionally, despite overwhelming evidence supporting the existence of anthropogenic climate change, and the dire need to mitigate its effects, many people still remain skeptical of climate change and its impact ( Feygina et al., 2010 ; Lewandowsky et al., 2013 ). One of the goals of education is to help students foster the skills necessary to be informed consumers of information ( DeAngelo et al., 2009 ), and providing students with the tools to think scientifically is a crucial component of reaching this goal. By focusing on scientific thinking in conjunction with CT, educators may be better able design specific policies that aim to facilitate the necessary skills students should have when they enter post-secondary training or the workforce. In other words, students should leave secondary school with the ability to rule out rival hypotheses, understand that correlation does not equal causation, the importance of falsifiability and replicability, the ability to recognize extraordinary claims, and use the principle of parsimony (e.g., Lett, 1990 ; Bartz, 2002 ).

Teaching scientific thinking is challenging, as people are vulnerable to trusting their intuitions and subjective observations and tend to prioritize them over objective scientific findings (e.g., Lilienfeld et al., 2012 ). Students and the public at large are prone to naïve realism, or the tendency to believe that our experiences and observations constitute objective reality ( Ross and Ward, 1996 ), when in fact our experiences and observations are subjective and prone to error (e.g., Kahneman, 2011 ). Educators at the post-secondary level tend to prioritize scientific thinking ( Lilienfeld, 2010 ), however many students do not continue on to a post-secondary program after they have completed high school. Further, students who are told they are learning critical thinking may believe they possess the skills to accurately assess the world around them. However, if they are not taught the specific skills needed to be scientifically literate, they may still fall prey to logical fallacies and biases. People tend to underestimate or not understand fallacies that can prevent them from making sound decisions ( Lilienfeld et al., 2001 ; Pronin et al., 2004 ; Lilienfeld, 2010 ). Thus, it is reasonable to think that a person who has not been adequately trained in scientific thinking would nonetheless consider themselves a strong critical thinker, and therefore would be even less likely consider his or her own personal biases. Another concern is that when teaching scientific thinking there is always the risk that students become overly critical or cynical (e.g., Mercier et al., 2017 ). By this, a student may be skeptical of nearly all findings, regardless of the supporting evidence. By incorporating and focusing on cognitive biases, instructors can help students understand their own biases, and demonstrate how the rigor of the scientific method can, at least partially, control for these biases.

Teaching CT remains controversial and confusing for many instructors ( Bensley and Murtagh, 2012 ). This is partly due to the lack of clarity in the definition of CT and the wide range of methods proposed to best teach CT ( Abrami et al., 2008 ; Bensley and Murtagh, 2012 ). For instance, Bensley and Spero (2014) found evidence for the effectiveness of direct approaches to teaching CT, a claim echoed in earlier research ( Abrami et al., 2008 ; Marin and Halpern, 2011 ). Despite their positive findings, some studies have failed to find support for measures of CT ( Burke et al., 2014 ) and others have found variable, yet positive, support for instructional methods ( Dochy et al., 2003 ). Unfortunately, there is a lack of research demonstrating the best pedagogical approaches to teaching scientific thinking at different grade levels. More research is needed to provide an empirically grounded approach to teach scientific thinking, and there is also a need to develop evidence based measures of scientific thinking that are grade and age appropriate. One approach to teaching scientific thinking may be to frame the topic in its simplest terms—the ability to “detect baloney” ( Sagan, 1995 ).

Sagan (1995) has promoted the tools necessary to recognize poor arguments, fallacies to avoid, and how to approach claims using the scientific method. The basic tenets of Sagan's argument apply to most claims, and have the potential to be an effective teaching tool across a range of abilities and ages. Sagan discusses the idea of a baloney detection kit, which contains the “tools” for skeptical thinking. The development of “baloney detection kits” which include age-appropriate scientific thinking skills may be an effective approach to teaching scientific thinking. These kits could include the style of exercises that are typically found under the banner of CT training (e.g., group discussions, evaluations of arguments) with a focus on teaching scientific thinking. An empirically validated kit does not yet exist, though there is much to draw from in the literature on pedagogical approaches to correcting cognitive biases, combatting pseudoscience, and teaching methodology (e.g., Smith, 2011 ). Further research is needed in this area to ensure that the correct, and age-appropriate, tools are part of any baloney detection kit.

Teaching Sagan's idea of baloney detection in conjunction with CT provides educators with a clear focus—to employ a pedagogical approach that helps students create sound and cogent arguments while avoiding falling prey to “baloney”. This is not to say that all of the information taught under the current banner of “critical thinking” is without value. In fact, many of the topics taught under the current approach of CT are important, even though they would not fit within the framework of some definitions of critical thinking. If educators want to ensure that students have the ability to be accurate consumers of information, a focus should be placed on including scientific thinking as a component of the science curriculum, as well as part of the broader teaching of CT.

Educators need to be provided with evidence-based approaches to teach the principles of scientific thinking. These principles should be taught in conjunction with evidence-based methods that mitigate the potential for fallacious reasoning and false beliefs. At a minimum, when students first learn about science, there should also be an introduction to the basics tenets of scientific thinking. Courses dedicated to promoting scientific thinking may also be effective. A course focused on cognitive biases, logical fallacies, and the hallmarks of scientific thinking adapted for each grade level may provide students with the foundation of solid scientific thinking skills to produce and evaluate arguments, and allow expansion of scientific thinking into other scholastic areas and classes. Evaluations of the efficacy of these courses would be essential, along with research to determine the best approach to incorporate scientific thinking into the curriculum.

If instructors know that students have at least some familiarity with the fundamental tenets of scientific thinking, the ability to expand and build upon these ideas in a variety of subject specific areas would further foster and promote these skills. For example, when discussing climate change, an instructor could add a brief discussion of why some people reject the science of climate change by relating this back to the information students will be familiar with from their scientific thinking courses. In terms of an issue like climate change, many students may have heard in political debates or popular culture that global warming trends are not real, or a “hoax” ( Lewandowsky et al., 2013 ). In this case, only teaching the data and facts may not be sufficient to change a student's mind about the reality of climate change ( Lewandowsky et al., 2012 ). Instructors would have more success by presenting students with the data on global warming trends as well as information on the biases that could lead some people reject the data ( Kowalski and Taylor, 2009 ; Lewandowsky et al., 2012 ). This type of instruction helps educators create informed citizens who are better able to guide future decision making and ensure that students enter the job market with the skills needed to be valuable members of the workforce and society as a whole.

By promoting scientific thinking, educators can ensure that students are at least exposed to the basic tenets of what makes a good argument, how to create their own arguments, recognize their own biases and those of others, and how to think like a scientist. There is still work to be done, as there is a need to put in place educational programs built on empirical evidence, as well as research investigating specific techniques to promote scientific thinking for children in earlier grade levels and develop measures to test if students have acquired the necessary scientific thinking skills. By using an evidence based approach to implement strategies to promote scientific thinking, and encouraging researchers to further explore the ideal methods for doing so, educators can better serve their students. When students are provided with the core ideas of how to detect baloney, and provided with examples of how baloney detection relates to the real world (e.g., Schmaltz and Lilienfeld, 2014 ), we are confident that they will be better able to navigate through the oceans of information available and choose the right path when deciding if information is valid.

Author Contribution

RS was the lead author and this paper, and both EJ and NW contributed equally.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1. ^ There is some debate about the role of dispositional factors in the ability for a person to engage in critical thinking, specifically that dispositional factors may mitigate any attempt to learn CT. The general consensus is that while dispositional traits may play a role in the ability to think critically, the general skills to be a critical thinker can be taught ( Niu et al., 2013 ; Abrami et al., 2015 ).

Abrami, P. C., Bernard, R. M., Borokhovski, E., Waddington, D. I., Wade, C. A., and Persson, T. (2015). Strategies for teaching students to think critically a meta-analysis. Rev. Educ. Res. 85, 275–314. doi: 10.3102/0034654308326084

CrossRef Full Text | Google Scholar

Abrami, P. C., Bernard, R. M., Borokhovski, E., Wade, A., Surkes, M. A., Tamim, R., et al. (2008). Instructional interventions affecting critical thinking skills and dispositions: a stage 1 meta-analysis. Rev. Educ. Res. 78, 1102–1134. doi: 10.3102/0034654308326084

Alberta Education (2015). Ministerial Order on Student Learning . Available online at: https://education.alberta.ca/policies-and-standards/student-learning/everyone/ministerial-order-on-student-learning-pdf/

Australian Curriculum Assessment and Reporting Authority (n.d.). Available online at: http://www.australiancurriculum.edu.au

Bartz, W. R. (2002). Teaching skepticism via the CRITIC acronym and the skeptical inquirer. Skeptical Inquirer 17, 42–44.

Google Scholar

Bensley, D. A., and Murtagh, M. P. (2012). Guidelines for a scientific approach to critical thinking assessment. Teach. Psychol. 39, 5–16. doi: 10.1177/0098628311430642

Bensley, D. A., and Spero, R. A. (2014). Improving critical thinking skills and metacognitive monitoring through direct infusion. Think. Skills Creativ. 12, 55–68. doi: 10.1016/j.tsc.2014.02.001

Bullock, M., Sodian, B., and Koerber, S. (2009). “Doing experiments and understanding science: development of scientific reasoning from childhood to adulthood,” in Human Development from Early Childhood to Early Adulthood: Findings from a 20 Year Longitudinal Study , eds W. Schneider and M. Bullock (New York, NY: Psychology Press), 173–197.

Burke, B. L., Sears, S. R., Kraus, S., and Roberts-Cady, S. (2014). Critical analysis: a comparison of critical thinking changes in psychology and philosophy classes. Teach. Psychol. 41, 28–36. doi: 10.1177/0098628313514175

California Department of Education (2014). Standard for Career Ready Practice . Available online at: http://www.cde.ca.gov/nr/ne/yr14/yr14rel22.asp

DeAngelo, L., Hurtado, S., Pryor, J. H., Kelly, K. R., Santos, J. L., and Korn, W. S. (2009). The American College Teacher: National Norms for the 2007-2008 HERI Faculty Survey . Los Angeles, CA: Higher Education Research Institute.

Dochy, F., Segers, M., Van den Bossche, P., and Gijbels, D. (2003). Effects of problem-based learning: a meta-analysis. Learn. Instruct. 13, 533–568. doi: 10.1016/S0959-4752(02)00025-7

Facione, P. A. (1990). Critical thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction. Research Findings and Recommendations. Newark, DE: American Philosophical Association.

Feygina, I., Jost, J. T., and Goldsmith, R. E. (2010). System justification, the denial of global warming, and the possibility of ‘system-sanctioned change’. Pers. Soc. Psychol. Bull. 36, 326–338. doi: 10.1177/0146167209351435

PubMed Abstract | CrossRef Full Text | Google Scholar

Forawi, S. A. (2016). Standard-based science education and critical thinking. Think. Skills Creativ. 20, 52–62. doi: 10.1016/j.tsc.2016.02.005

Foucault, M. (1984). The Foucault Reader . New York, NY: Pantheon.

Hatcher, D. L. (2000). Arguments for another definition of critical thinking. Inquiry 20, 3–8. doi: 10.5840/inquiryctnews20002016

Huber, C. R., and Kuncel, N. R. (2016). Does college teach critical thinking? A meta-analysis. Rev. Educ. Res. 86, 431–468. doi: 10.3102/0034654315605917

Johnson, R. H., and Hamby, B. (2015). A meta-level approach to the problem of defining “Critical Thinking”. Argumentation 29, 417–430. doi: 10.1007/s10503-015-9356-4

Kahneman, D. (2011). Thinking, Fast and Slow . New York, NY: Farrar, Straus and Giroux.

Koerber, S., Mayer, D., Osterhaus, C., Schwippert, K., and Sodian, B. (2015). The development of scientific thinking in elementary school: a comprehensive inventory. Child Dev. 86, 327–336. doi: 10.1111/cdev.12298

Kowalski, P., and Taylor, A. K. (2009). The effect of refuting misconceptions in the introductory psychology class. Teach. Psychol. 36, 153–159. doi: 10.1080/00986280902959986

Lawson, T. J. (1999). Assessing psychological critical thinking as a learning outcome for psychology majors. Teach. Psychol. 26, 207–209. doi: 10.1207/S15328023TOP260311

CrossRef Full Text

Lawson, T. J., Jordan-Fleming, M. K., and Bodle, J. H. (2015). Measuring psychological critical thinking: an update. Teach. Psychol. 42, 248–253. doi: 10.1177/0098628315587624

Lett, J. (1990). A field guide to critical thinking. Skeptical Inquirer , 14, 153–160.

Lewandowsky, S., Ecker, U. H., Seifert, C. M., Schwarz, N., and Cook, J. (2012). Misinformation and its correction: continued influence and successful debiasing. Psychol. Sci. Public Interest 13, 106–131. doi: 10.1177/1529100612451018

Lewandowsky, S., Oberauer, K., and Gignac, G. E. (2013). NASA faked the moon landing—therefore, (climate) science is a hoax: an anatomy of the motivated rejection of science. Psychol. Sci. 24, 622–633. doi: 10.1177/0956797612457686

Lilienfeld, S. O. (2010). Can psychology become a science? Pers. Individ. Dif. 49, 281–288. doi: 10.1016/j.paid.2010.01.024

Lilienfeld, S. O., Ammirati, R., and David, M. (2012). Distinguishing science from pseudoscience in school psychology: science and scientific thinking as safeguards against human error. J. Sch. Psychol. 50, 7–36. doi: 10.1016/j.jsp.2011.09.006

Lilienfeld, S. O., Lohr, J. M., and Morier, D. (2001). The teaching of courses in the science and pseudoscience of psychology: useful resources. Teach. Psychol. 28, 182–191. doi: 10.1207/S15328023TOP2803_03

Lobato, E., Mendoza, J., Sims, V., and Chin, M. (2014). Examining the relationship between conspiracy theories, paranormal beliefs, and pseudoscience acceptance among a university population. Appl. Cogn. Psychol. 28, 617–625. doi: 10.1002/acp.3042

Marin, L. M., and Halpern, D. F. (2011). Pedagogy for developing critical thinking in adolescents: explicit instruction produces greatest gains. Think. Skills Creativ. 6, 1–13. doi: 10.1016/j.tsc.2010.08.002

Mercier, H., Boudry, M., Paglieri, F., and Trouche, E. (2017). Natural-born arguers: teaching how to make the best of our reasoning abilities. Educ. Psychol. 52, 1–16. doi: 10.1080/00461520.2016.1207537

Niu, L., Behar-Horenstein, L. S., and Garvan, C. W. (2013). Do instructional interventions influence college students' critical thinking skills? A meta-analysis. Educ. Res. Rev. 9, 114–128. doi: 10.1016/j.edurev.2012.12.002

Pronin, E., Gilovich, T., and Ross, L. (2004). Objectivity in the eye of the beholder: divergent perceptions of bias in self versus others. Psychol. Rev. 111, 781–799. doi: 10.1037/0033-295X.111.3.781

Ross, L., and Ward, A. (1996). “Naive realism in everyday life: implications for social conflict and misunderstanding,” in Values and Knowledge , eds E. S. Reed, E. Turiel, T. Brown, E. S. Reed, E. Turiel and T. Brown (Hillsdale, NJ: Lawrence Erlbaum Associates Inc.), 103–135.

Sagan, C. (1995). Demon-Haunted World: Science as a Candle in the Dark . New York, NY: Random House.

Schmaltz, R., and Lilienfeld, S. O. (2014). Hauntings, homeopathy, and the Hopkinsville Goblins: using pseudoscience to teach scientific thinking. Front. Psychol. 5:336. doi: 10.3389/fpsyg.2014.00336

Smith, J. C. (2011). Pseudoscience and Extraordinary Claims of the Paranormal: A Critical Thinker's Toolkit . New York, NY: John Wiley and Sons.

Wright, I. (2001). Critical thinking in the schools: why doesn't much happen? Inform. Logic 22, 137–154. doi: 10.22329/il.v22i2.2579

Keywords: scientific thinking, critical thinking, teaching resources, skepticism, education policy

Citation: Schmaltz RM, Jansen E and Wenckowski N (2017) Redefining Critical Thinking: Teaching Students to Think like Scientists. Front. Psychol . 8:459. doi: 10.3389/fpsyg.2017.00459

Received: 13 December 2016; Accepted: 13 March 2017; Published: 29 March 2017.

Reviewed by:

Copyright © 2017 Schmaltz, Jansen and Wenckowski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rodney M. Schmaltz, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 11 January 2023

The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature

  • Enwei Xu   ORCID: orcid.org/0000-0001-6424-8169 1 ,
  • Wei Wang 1 &
  • Qingxia Wang 1  

Humanities and Social Sciences Communications volume  10 , Article number:  16 ( 2023 ) Cite this article

15k Accesses

14 Citations

3 Altmetric

Metrics details

  • Science, technology and society

Collaborative problem-solving has been widely embraced in the classroom instruction of critical thinking, which is regarded as the core of curriculum reform based on key competencies in the field of education as well as a key competence for learners in the 21st century. However, the effectiveness of collaborative problem-solving in promoting students’ critical thinking remains uncertain. This current research presents the major findings of a meta-analysis of 36 pieces of the literature revealed in worldwide educational periodicals during the 21st century to identify the effectiveness of collaborative problem-solving in promoting students’ critical thinking and to determine, based on evidence, whether and to what extent collaborative problem solving can result in a rise or decrease in critical thinking. The findings show that (1) collaborative problem solving is an effective teaching approach to foster students’ critical thinking, with a significant overall effect size (ES = 0.82, z  = 12.78, P  < 0.01, 95% CI [0.69, 0.95]); (2) in respect to the dimensions of critical thinking, collaborative problem solving can significantly and successfully enhance students’ attitudinal tendencies (ES = 1.17, z  = 7.62, P  < 0.01, 95% CI[0.87, 1.47]); nevertheless, it falls short in terms of improving students’ cognitive skills, having only an upper-middle impact (ES = 0.70, z  = 11.55, P  < 0.01, 95% CI[0.58, 0.82]); and (3) the teaching type (chi 2  = 7.20, P  < 0.05), intervention duration (chi 2  = 12.18, P  < 0.01), subject area (chi 2  = 13.36, P  < 0.05), group size (chi 2  = 8.77, P  < 0.05), and learning scaffold (chi 2  = 9.03, P  < 0.01) all have an impact on critical thinking, and they can be viewed as important moderating factors that affect how critical thinking develops. On the basis of these results, recommendations are made for further study and instruction to better support students’ critical thinking in the context of collaborative problem-solving.

Similar content being viewed by others

critical thinking skills google scholar

Fostering twenty-first century skills among primary school students through math project-based learning

critical thinking skills google scholar

A meta-analysis to gauge the impact of pedagogies employed in mixed-ability high school biology classrooms

critical thinking skills google scholar

A guide to critical thinking: implications for dental education

Introduction.

Although critical thinking has a long history in research, the concept of critical thinking, which is regarded as an essential competence for learners in the 21st century, has recently attracted more attention from researchers and teaching practitioners (National Research Council, 2012 ). Critical thinking should be the core of curriculum reform based on key competencies in the field of education (Peng and Deng, 2017 ) because students with critical thinking can not only understand the meaning of knowledge but also effectively solve practical problems in real life even after knowledge is forgotten (Kek and Huijser, 2011 ). The definition of critical thinking is not universal (Ennis, 1989 ; Castle, 2009 ; Niu et al., 2013 ). In general, the definition of critical thinking is a self-aware and self-regulated thought process (Facione, 1990 ; Niu et al., 2013 ). It refers to the cognitive skills needed to interpret, analyze, synthesize, reason, and evaluate information as well as the attitudinal tendency to apply these abilities (Halpern, 2001 ). The view that critical thinking can be taught and learned through curriculum teaching has been widely supported by many researchers (e.g., Kuncel, 2011 ; Leng and Lu, 2020 ), leading to educators’ efforts to foster it among students. In the field of teaching practice, there are three types of courses for teaching critical thinking (Ennis, 1989 ). The first is an independent curriculum in which critical thinking is taught and cultivated without involving the knowledge of specific disciplines; the second is an integrated curriculum in which critical thinking is integrated into the teaching of other disciplines as a clear teaching goal; and the third is a mixed curriculum in which critical thinking is taught in parallel to the teaching of other disciplines for mixed teaching training. Furthermore, numerous measuring tools have been developed by researchers and educators to measure critical thinking in the context of teaching practice. These include standardized measurement tools, such as WGCTA, CCTST, CCTT, and CCTDI, which have been verified by repeated experiments and are considered effective and reliable by international scholars (Facione and Facione, 1992 ). In short, descriptions of critical thinking, including its two dimensions of attitudinal tendency and cognitive skills, different types of teaching courses, and standardized measurement tools provide a complex normative framework for understanding, teaching, and evaluating critical thinking.

Cultivating critical thinking in curriculum teaching can start with a problem, and one of the most popular critical thinking instructional approaches is problem-based learning (Liu et al., 2020 ). Duch et al. ( 2001 ) noted that problem-based learning in group collaboration is progressive active learning, which can improve students’ critical thinking and problem-solving skills. Collaborative problem-solving is the organic integration of collaborative learning and problem-based learning, which takes learners as the center of the learning process and uses problems with poor structure in real-world situations as the starting point for the learning process (Liang et al., 2017 ). Students learn the knowledge needed to solve problems in a collaborative group, reach a consensus on problems in the field, and form solutions through social cooperation methods, such as dialogue, interpretation, questioning, debate, negotiation, and reflection, thus promoting the development of learners’ domain knowledge and critical thinking (Cindy, 2004 ; Liang et al., 2017 ).

Collaborative problem-solving has been widely used in the teaching practice of critical thinking, and several studies have attempted to conduct a systematic review and meta-analysis of the empirical literature on critical thinking from various perspectives. However, little attention has been paid to the impact of collaborative problem-solving on critical thinking. Therefore, the best approach for developing and enhancing critical thinking throughout collaborative problem-solving is to examine how to implement critical thinking instruction; however, this issue is still unexplored, which means that many teachers are incapable of better instructing critical thinking (Leng and Lu, 2020 ; Niu et al., 2013 ). For example, Huber ( 2016 ) provided the meta-analysis findings of 71 publications on gaining critical thinking over various time frames in college with the aim of determining whether critical thinking was truly teachable. These authors found that learners significantly improve their critical thinking while in college and that critical thinking differs with factors such as teaching strategies, intervention duration, subject area, and teaching type. The usefulness of collaborative problem-solving in fostering students’ critical thinking, however, was not determined by this study, nor did it reveal whether there existed significant variations among the different elements. A meta-analysis of 31 pieces of educational literature was conducted by Liu et al. ( 2020 ) to assess the impact of problem-solving on college students’ critical thinking. These authors found that problem-solving could promote the development of critical thinking among college students and proposed establishing a reasonable group structure for problem-solving in a follow-up study to improve students’ critical thinking. Additionally, previous empirical studies have reached inconclusive and even contradictory conclusions about whether and to what extent collaborative problem-solving increases or decreases critical thinking levels. As an illustration, Yang et al. ( 2008 ) carried out an experiment on the integrated curriculum teaching of college students based on a web bulletin board with the goal of fostering participants’ critical thinking in the context of collaborative problem-solving. These authors’ research revealed that through sharing, debating, examining, and reflecting on various experiences and ideas, collaborative problem-solving can considerably enhance students’ critical thinking in real-life problem situations. In contrast, collaborative problem-solving had a positive impact on learners’ interaction and could improve learning interest and motivation but could not significantly improve students’ critical thinking when compared to traditional classroom teaching, according to research by Naber and Wyatt ( 2014 ) and Sendag and Odabasi ( 2009 ) on undergraduate and high school students, respectively.

The above studies show that there is inconsistency regarding the effectiveness of collaborative problem-solving in promoting students’ critical thinking. Therefore, it is essential to conduct a thorough and trustworthy review to detect and decide whether and to what degree collaborative problem-solving can result in a rise or decrease in critical thinking. Meta-analysis is a quantitative analysis approach that is utilized to examine quantitative data from various separate studies that are all focused on the same research topic. This approach characterizes the effectiveness of its impact by averaging the effect sizes of numerous qualitative studies in an effort to reduce the uncertainty brought on by independent research and produce more conclusive findings (Lipsey and Wilson, 2001 ).

This paper used a meta-analytic approach and carried out a meta-analysis to examine the effectiveness of collaborative problem-solving in promoting students’ critical thinking in order to make a contribution to both research and practice. The following research questions were addressed by this meta-analysis:

What is the overall effect size of collaborative problem-solving in promoting students’ critical thinking and its impact on the two dimensions of critical thinking (i.e., attitudinal tendency and cognitive skills)?

How are the disparities between the study conclusions impacted by various moderating variables if the impacts of various experimental designs in the included studies are heterogeneous?

This research followed the strict procedures (e.g., database searching, identification, screening, eligibility, merging, duplicate removal, and analysis of included studies) of Cooper’s ( 2010 ) proposed meta-analysis approach for examining quantitative data from various separate studies that are all focused on the same research topic. The relevant empirical research that appeared in worldwide educational periodicals within the 21st century was subjected to this meta-analysis using Rev-Man 5.4. The consistency of the data extracted separately by two researchers was tested using Cohen’s kappa coefficient, and a publication bias test and a heterogeneity test were run on the sample data to ascertain the quality of this meta-analysis.

Data sources and search strategies

There were three stages to the data collection process for this meta-analysis, as shown in Fig. 1 , which shows the number of articles included and eliminated during the selection process based on the statement and study eligibility criteria.

figure 1

This flowchart shows the number of records identified, included and excluded in the article.

First, the databases used to systematically search for relevant articles were the journal papers of the Web of Science Core Collection and the Chinese Core source journal, as well as the Chinese Social Science Citation Index (CSSCI) source journal papers included in CNKI. These databases were selected because they are credible platforms that are sources of scholarly and peer-reviewed information with advanced search tools and contain literature relevant to the subject of our topic from reliable researchers and experts. The search string with the Boolean operator used in the Web of Science was “TS = (((“critical thinking” or “ct” and “pretest” or “posttest”) or (“critical thinking” or “ct” and “control group” or “quasi experiment” or “experiment”)) and (“collaboration” or “collaborative learning” or “CSCL”) and (“problem solving” or “problem-based learning” or “PBL”))”. The research area was “Education Educational Research”, and the search period was “January 1, 2000, to December 30, 2021”. A total of 412 papers were obtained. The search string with the Boolean operator used in the CNKI was “SU = (‘critical thinking’*‘collaboration’ + ‘critical thinking’*‘collaborative learning’ + ‘critical thinking’*‘CSCL’ + ‘critical thinking’*‘problem solving’ + ‘critical thinking’*‘problem-based learning’ + ‘critical thinking’*‘PBL’ + ‘critical thinking’*‘problem oriented’) AND FT = (‘experiment’ + ‘quasi experiment’ + ‘pretest’ + ‘posttest’ + ‘empirical study’)” (translated into Chinese when searching). A total of 56 studies were found throughout the search period of “January 2000 to December 2021”. From the databases, all duplicates and retractions were eliminated before exporting the references into Endnote, a program for managing bibliographic references. In all, 466 studies were found.

Second, the studies that matched the inclusion and exclusion criteria for the meta-analysis were chosen by two researchers after they had reviewed the abstracts and titles of the gathered articles, yielding a total of 126 studies.

Third, two researchers thoroughly reviewed each included article’s whole text in accordance with the inclusion and exclusion criteria. Meanwhile, a snowball search was performed using the references and citations of the included articles to ensure complete coverage of the articles. Ultimately, 36 articles were kept.

Two researchers worked together to carry out this entire process, and a consensus rate of almost 94.7% was reached after discussion and negotiation to clarify any emerging differences.

Eligibility criteria

Since not all the retrieved studies matched the criteria for this meta-analysis, eligibility criteria for both inclusion and exclusion were developed as follows:

The publication language of the included studies was limited to English and Chinese, and the full text could be obtained. Articles that did not meet the publication language and articles not published between 2000 and 2021 were excluded.

The research design of the included studies must be empirical and quantitative studies that can assess the effect of collaborative problem-solving on the development of critical thinking. Articles that could not identify the causal mechanisms by which collaborative problem-solving affects critical thinking, such as review articles and theoretical articles, were excluded.

The research method of the included studies must feature a randomized control experiment or a quasi-experiment, or a natural experiment, which have a higher degree of internal validity with strong experimental designs and can all plausibly provide evidence that critical thinking and collaborative problem-solving are causally related. Articles with non-experimental research methods, such as purely correlational or observational studies, were excluded.

The participants of the included studies were only students in school, including K-12 students and college students. Articles in which the participants were non-school students, such as social workers or adult learners, were excluded.

The research results of the included studies must mention definite signs that may be utilized to gauge critical thinking’s impact (e.g., sample size, mean value, or standard deviation). Articles that lacked specific measurement indicators for critical thinking and could not calculate the effect size were excluded.

Data coding design

In order to perform a meta-analysis, it is necessary to collect the most important information from the articles, codify that information’s properties, and convert descriptive data into quantitative data. Therefore, this study designed a data coding template (see Table 1 ). Ultimately, 16 coding fields were retained.

The designed data-coding template consisted of three pieces of information. Basic information about the papers was included in the descriptive information: the publishing year, author, serial number, and title of the paper.

The variable information for the experimental design had three variables: the independent variable (instruction method), the dependent variable (critical thinking), and the moderating variable (learning stage, teaching type, intervention duration, learning scaffold, group size, measuring tool, and subject area). Depending on the topic of this study, the intervention strategy, as the independent variable, was coded into collaborative and non-collaborative problem-solving. The dependent variable, critical thinking, was coded as a cognitive skill and an attitudinal tendency. And seven moderating variables were created by grouping and combining the experimental design variables discovered within the 36 studies (see Table 1 ), where learning stages were encoded as higher education, high school, middle school, and primary school or lower; teaching types were encoded as mixed courses, integrated courses, and independent courses; intervention durations were encoded as 0–1 weeks, 1–4 weeks, 4–12 weeks, and more than 12 weeks; group sizes were encoded as 2–3 persons, 4–6 persons, 7–10 persons, and more than 10 persons; learning scaffolds were encoded as teacher-supported learning scaffold, technique-supported learning scaffold, and resource-supported learning scaffold; measuring tools were encoded as standardized measurement tools (e.g., WGCTA, CCTT, CCTST, and CCTDI) and self-adapting measurement tools (e.g., modified or made by researchers); and subject areas were encoded according to the specific subjects used in the 36 included studies.

The data information contained three metrics for measuring critical thinking: sample size, average value, and standard deviation. It is vital to remember that studies with various experimental designs frequently adopt various formulas to determine the effect size. And this paper used Morris’ proposed standardized mean difference (SMD) calculation formula ( 2008 , p. 369; see Supplementary Table S3 ).

Procedure for extracting and coding data

According to the data coding template (see Table 1 ), the 36 papers’ information was retrieved by two researchers, who then entered them into Excel (see Supplementary Table S1 ). The results of each study were extracted separately in the data extraction procedure if an article contained numerous studies on critical thinking, or if a study assessed different critical thinking dimensions. For instance, Tiwari et al. ( 2010 ) used four time points, which were viewed as numerous different studies, to examine the outcomes of critical thinking, and Chen ( 2013 ) included the two outcome variables of attitudinal tendency and cognitive skills, which were regarded as two studies. After discussion and negotiation during data extraction, the two researchers’ consistency test coefficients were roughly 93.27%. Supplementary Table S2 details the key characteristics of the 36 included articles with 79 effect quantities, including descriptive information (e.g., the publishing year, author, serial number, and title of the paper), variable information (e.g., independent variables, dependent variables, and moderating variables), and data information (e.g., mean values, standard deviations, and sample size). Following that, testing for publication bias and heterogeneity was done on the sample data using the Rev-Man 5.4 software, and then the test results were used to conduct a meta-analysis.

Publication bias test

When the sample of studies included in a meta-analysis does not accurately reflect the general status of research on the relevant subject, publication bias is said to be exhibited in this research. The reliability and accuracy of the meta-analysis may be impacted by publication bias. Due to this, the meta-analysis needs to check the sample data for publication bias (Stewart et al., 2006 ). A popular method to check for publication bias is the funnel plot; and it is unlikely that there will be publishing bias when the data are equally dispersed on either side of the average effect size and targeted within the higher region. The data are equally dispersed within the higher portion of the efficient zone, consistent with the funnel plot connected with this analysis (see Fig. 2 ), indicating that publication bias is unlikely in this situation.

figure 2

This funnel plot shows the result of publication bias of 79 effect quantities across 36 studies.

Heterogeneity test

To select the appropriate effect models for the meta-analysis, one might use the results of a heterogeneity test on the data effect sizes. In a meta-analysis, it is common practice to gauge the degree of data heterogeneity using the I 2 value, and I 2  ≥ 50% is typically understood to denote medium-high heterogeneity, which calls for the adoption of a random effect model; if not, a fixed effect model ought to be applied (Lipsey and Wilson, 2001 ). The findings of the heterogeneity test in this paper (see Table 2 ) revealed that I 2 was 86% and displayed significant heterogeneity ( P  < 0.01). To ensure accuracy and reliability, the overall effect size ought to be calculated utilizing the random effect model.

The analysis of the overall effect size

This meta-analysis utilized a random effect model to examine 79 effect quantities from 36 studies after eliminating heterogeneity. In accordance with Cohen’s criterion (Cohen, 1992 ), it is abundantly clear from the analysis results, which are shown in the forest plot of the overall effect (see Fig. 3 ), that the cumulative impact size of cooperative problem-solving is 0.82, which is statistically significant ( z  = 12.78, P  < 0.01, 95% CI [0.69, 0.95]), and can encourage learners to practice critical thinking.

figure 3

This forest plot shows the analysis result of the overall effect size across 36 studies.

In addition, this study examined two distinct dimensions of critical thinking to better understand the precise contributions that collaborative problem-solving makes to the growth of critical thinking. The findings (see Table 3 ) indicate that collaborative problem-solving improves cognitive skills (ES = 0.70) and attitudinal tendency (ES = 1.17), with significant intergroup differences (chi 2  = 7.95, P  < 0.01). Although collaborative problem-solving improves both dimensions of critical thinking, it is essential to point out that the improvements in students’ attitudinal tendency are much more pronounced and have a significant comprehensive effect (ES = 1.17, z  = 7.62, P  < 0.01, 95% CI [0.87, 1.47]), whereas gains in learners’ cognitive skill are slightly improved and are just above average. (ES = 0.70, z  = 11.55, P  < 0.01, 95% CI [0.58, 0.82]).

The analysis of moderator effect size

The whole forest plot’s 79 effect quantities underwent a two-tailed test, which revealed significant heterogeneity ( I 2  = 86%, z  = 12.78, P  < 0.01), indicating differences between various effect sizes that may have been influenced by moderating factors other than sampling error. Therefore, exploring possible moderating factors that might produce considerable heterogeneity was done using subgroup analysis, such as the learning stage, learning scaffold, teaching type, group size, duration of the intervention, measuring tool, and the subject area included in the 36 experimental designs, in order to further explore the key factors that influence critical thinking. The findings (see Table 4 ) indicate that various moderating factors have advantageous effects on critical thinking. In this situation, the subject area (chi 2  = 13.36, P  < 0.05), group size (chi 2  = 8.77, P  < 0.05), intervention duration (chi 2  = 12.18, P  < 0.01), learning scaffold (chi 2  = 9.03, P  < 0.01), and teaching type (chi 2  = 7.20, P  < 0.05) are all significant moderators that can be applied to support the cultivation of critical thinking. However, since the learning stage and the measuring tools did not significantly differ among intergroup (chi 2  = 3.15, P  = 0.21 > 0.05, and chi 2  = 0.08, P  = 0.78 > 0.05), we are unable to explain why these two factors are crucial in supporting the cultivation of critical thinking in the context of collaborative problem-solving. These are the precise outcomes, as follows:

Various learning stages influenced critical thinking positively, without significant intergroup differences (chi 2  = 3.15, P  = 0.21 > 0.05). High school was first on the list of effect sizes (ES = 1.36, P  < 0.01), then higher education (ES = 0.78, P  < 0.01), and middle school (ES = 0.73, P  < 0.01). These results show that, despite the learning stage’s beneficial influence on cultivating learners’ critical thinking, we are unable to explain why it is essential for cultivating critical thinking in the context of collaborative problem-solving.

Different teaching types had varying degrees of positive impact on critical thinking, with significant intergroup differences (chi 2  = 7.20, P  < 0.05). The effect size was ranked as follows: mixed courses (ES = 1.34, P  < 0.01), integrated courses (ES = 0.81, P  < 0.01), and independent courses (ES = 0.27, P  < 0.01). These results indicate that the most effective approach to cultivate critical thinking utilizing collaborative problem solving is through the teaching type of mixed courses.

Various intervention durations significantly improved critical thinking, and there were significant intergroup differences (chi 2  = 12.18, P  < 0.01). The effect sizes related to this variable showed a tendency to increase with longer intervention durations. The improvement in critical thinking reached a significant level (ES = 0.85, P  < 0.01) after more than 12 weeks of training. These findings indicate that the intervention duration and critical thinking’s impact are positively correlated, with a longer intervention duration having a greater effect.

Different learning scaffolds influenced critical thinking positively, with significant intergroup differences (chi 2  = 9.03, P  < 0.01). The resource-supported learning scaffold (ES = 0.69, P  < 0.01) acquired a medium-to-higher level of impact, the technique-supported learning scaffold (ES = 0.63, P  < 0.01) also attained a medium-to-higher level of impact, and the teacher-supported learning scaffold (ES = 0.92, P  < 0.01) displayed a high level of significant impact. These results show that the learning scaffold with teacher support has the greatest impact on cultivating critical thinking.

Various group sizes influenced critical thinking positively, and the intergroup differences were statistically significant (chi 2  = 8.77, P  < 0.05). Critical thinking showed a general declining trend with increasing group size. The overall effect size of 2–3 people in this situation was the biggest (ES = 0.99, P  < 0.01), and when the group size was greater than 7 people, the improvement in critical thinking was at the lower-middle level (ES < 0.5, P  < 0.01). These results show that the impact on critical thinking is positively connected with group size, and as group size grows, so does the overall impact.

Various measuring tools influenced critical thinking positively, with significant intergroup differences (chi 2  = 0.08, P  = 0.78 > 0.05). In this situation, the self-adapting measurement tools obtained an upper-medium level of effect (ES = 0.78), whereas the complete effect size of the standardized measurement tools was the largest, achieving a significant level of effect (ES = 0.84, P  < 0.01). These results show that, despite the beneficial influence of the measuring tool on cultivating critical thinking, we are unable to explain why it is crucial in fostering the growth of critical thinking by utilizing the approach of collaborative problem-solving.

Different subject areas had a greater impact on critical thinking, and the intergroup differences were statistically significant (chi 2  = 13.36, P  < 0.05). Mathematics had the greatest overall impact, achieving a significant level of effect (ES = 1.68, P  < 0.01), followed by science (ES = 1.25, P  < 0.01) and medical science (ES = 0.87, P  < 0.01), both of which also achieved a significant level of effect. Programming technology was the least effective (ES = 0.39, P  < 0.01), only having a medium-low degree of effect compared to education (ES = 0.72, P  < 0.01) and other fields (such as language, art, and social sciences) (ES = 0.58, P  < 0.01). These results suggest that scientific fields (e.g., mathematics, science) may be the most effective subject areas for cultivating critical thinking utilizing the approach of collaborative problem-solving.

The effectiveness of collaborative problem solving with regard to teaching critical thinking

According to this meta-analysis, using collaborative problem-solving as an intervention strategy in critical thinking teaching has a considerable amount of impact on cultivating learners’ critical thinking as a whole and has a favorable promotional effect on the two dimensions of critical thinking. According to certain studies, collaborative problem solving, the most frequently used critical thinking teaching strategy in curriculum instruction can considerably enhance students’ critical thinking (e.g., Liang et al., 2017 ; Liu et al., 2020 ; Cindy, 2004 ). This meta-analysis provides convergent data support for the above research views. Thus, the findings of this meta-analysis not only effectively address the first research query regarding the overall effect of cultivating critical thinking and its impact on the two dimensions of critical thinking (i.e., attitudinal tendency and cognitive skills) utilizing the approach of collaborative problem-solving, but also enhance our confidence in cultivating critical thinking by using collaborative problem-solving intervention approach in the context of classroom teaching.

Furthermore, the associated improvements in attitudinal tendency are much stronger, but the corresponding improvements in cognitive skill are only marginally better. According to certain studies, cognitive skill differs from the attitudinal tendency in classroom instruction; the cultivation and development of the former as a key ability is a process of gradual accumulation, while the latter as an attitude is affected by the context of the teaching situation (e.g., a novel and exciting teaching approach, challenging and rewarding tasks) (Halpern, 2001 ; Wei and Hong, 2022 ). Collaborative problem-solving as a teaching approach is exciting and interesting, as well as rewarding and challenging; because it takes the learners as the focus and examines problems with poor structure in real situations, and it can inspire students to fully realize their potential for problem-solving, which will significantly improve their attitudinal tendency toward solving problems (Liu et al., 2020 ). Similar to how collaborative problem-solving influences attitudinal tendency, attitudinal tendency impacts cognitive skill when attempting to solve a problem (Liu et al., 2020 ; Zhang et al., 2022 ), and stronger attitudinal tendencies are associated with improved learning achievement and cognitive ability in students (Sison, 2008 ; Zhang et al., 2022 ). It can be seen that the two specific dimensions of critical thinking as well as critical thinking as a whole are affected by collaborative problem-solving, and this study illuminates the nuanced links between cognitive skills and attitudinal tendencies with regard to these two dimensions of critical thinking. To fully develop students’ capacity for critical thinking, future empirical research should pay closer attention to cognitive skills.

The moderating effects of collaborative problem solving with regard to teaching critical thinking

In order to further explore the key factors that influence critical thinking, exploring possible moderating effects that might produce considerable heterogeneity was done using subgroup analysis. The findings show that the moderating factors, such as the teaching type, learning stage, group size, learning scaffold, duration of the intervention, measuring tool, and the subject area included in the 36 experimental designs, could all support the cultivation of collaborative problem-solving in critical thinking. Among them, the effect size differences between the learning stage and measuring tool are not significant, which does not explain why these two factors are crucial in supporting the cultivation of critical thinking utilizing the approach of collaborative problem-solving.

In terms of the learning stage, various learning stages influenced critical thinking positively without significant intergroup differences, indicating that we are unable to explain why it is crucial in fostering the growth of critical thinking.

Although high education accounts for 70.89% of all empirical studies performed by researchers, high school may be the appropriate learning stage to foster students’ critical thinking by utilizing the approach of collaborative problem-solving since it has the largest overall effect size. This phenomenon may be related to student’s cognitive development, which needs to be further studied in follow-up research.

With regard to teaching type, mixed course teaching may be the best teaching method to cultivate students’ critical thinking. Relevant studies have shown that in the actual teaching process if students are trained in thinking methods alone, the methods they learn are isolated and divorced from subject knowledge, which is not conducive to their transfer of thinking methods; therefore, if students’ thinking is trained only in subject teaching without systematic method training, it is challenging to apply to real-world circumstances (Ruggiero, 2012 ; Hu and Liu, 2015 ). Teaching critical thinking as mixed course teaching in parallel to other subject teachings can achieve the best effect on learners’ critical thinking, and explicit critical thinking instruction is more effective than less explicit critical thinking instruction (Bensley and Spero, 2014 ).

In terms of the intervention duration, with longer intervention times, the overall effect size shows an upward tendency. Thus, the intervention duration and critical thinking’s impact are positively correlated. Critical thinking, as a key competency for students in the 21st century, is difficult to get a meaningful improvement in a brief intervention duration. Instead, it could be developed over a lengthy period of time through consistent teaching and the progressive accumulation of knowledge (Halpern, 2001 ; Hu and Liu, 2015 ). Therefore, future empirical studies ought to take these restrictions into account throughout a longer period of critical thinking instruction.

With regard to group size, a group size of 2–3 persons has the highest effect size, and the comprehensive effect size decreases with increasing group size in general. This outcome is in line with some research findings; as an example, a group composed of two to four members is most appropriate for collaborative learning (Schellens and Valcke, 2006 ). However, the meta-analysis results also indicate that once the group size exceeds 7 people, small groups cannot produce better interaction and performance than large groups. This may be because the learning scaffolds of technique support, resource support, and teacher support improve the frequency and effectiveness of interaction among group members, and a collaborative group with more members may increase the diversity of views, which is helpful to cultivate critical thinking utilizing the approach of collaborative problem-solving.

With regard to the learning scaffold, the three different kinds of learning scaffolds can all enhance critical thinking. Among them, the teacher-supported learning scaffold has the largest overall effect size, demonstrating the interdependence of effective learning scaffolds and collaborative problem-solving. This outcome is in line with some research findings; as an example, a successful strategy is to encourage learners to collaborate, come up with solutions, and develop critical thinking skills by using learning scaffolds (Reiser, 2004 ; Xu et al., 2022 ); learning scaffolds can lower task complexity and unpleasant feelings while also enticing students to engage in learning activities (Wood et al., 2006 ); learning scaffolds are designed to assist students in using learning approaches more successfully to adapt the collaborative problem-solving process, and the teacher-supported learning scaffolds have the greatest influence on critical thinking in this process because they are more targeted, informative, and timely (Xu et al., 2022 ).

With respect to the measuring tool, despite the fact that standardized measurement tools (such as the WGCTA, CCTT, and CCTST) have been acknowledged as trustworthy and effective by worldwide experts, only 54.43% of the research included in this meta-analysis adopted them for assessment, and the results indicated no intergroup differences. These results suggest that not all teaching circumstances are appropriate for measuring critical thinking using standardized measurement tools. “The measuring tools for measuring thinking ability have limits in assessing learners in educational situations and should be adapted appropriately to accurately assess the changes in learners’ critical thinking.”, according to Simpson and Courtney ( 2002 , p. 91). As a result, in order to more fully and precisely gauge how learners’ critical thinking has evolved, we must properly modify standardized measuring tools based on collaborative problem-solving learning contexts.

With regard to the subject area, the comprehensive effect size of science departments (e.g., mathematics, science, medical science) is larger than that of language arts and social sciences. Some recent international education reforms have noted that critical thinking is a basic part of scientific literacy. Students with scientific literacy can prove the rationality of their judgment according to accurate evidence and reasonable standards when they face challenges or poorly structured problems (Kyndt et al., 2013 ), which makes critical thinking crucial for developing scientific understanding and applying this understanding to practical problem solving for problems related to science, technology, and society (Yore et al., 2007 ).

Suggestions for critical thinking teaching

Other than those stated in the discussion above, the following suggestions are offered for critical thinking instruction utilizing the approach of collaborative problem-solving.

First, teachers should put a special emphasis on the two core elements, which are collaboration and problem-solving, to design real problems based on collaborative situations. This meta-analysis provides evidence to support the view that collaborative problem-solving has a strong synergistic effect on promoting students’ critical thinking. Asking questions about real situations and allowing learners to take part in critical discussions on real problems during class instruction are key ways to teach critical thinking rather than simply reading speculative articles without practice (Mulnix, 2012 ). Furthermore, the improvement of students’ critical thinking is realized through cognitive conflict with other learners in the problem situation (Yang et al., 2008 ). Consequently, it is essential for teachers to put a special emphasis on the two core elements, which are collaboration and problem-solving, and design real problems and encourage students to discuss, negotiate, and argue based on collaborative problem-solving situations.

Second, teachers should design and implement mixed courses to cultivate learners’ critical thinking, utilizing the approach of collaborative problem-solving. Critical thinking can be taught through curriculum instruction (Kuncel, 2011 ; Leng and Lu, 2020 ), with the goal of cultivating learners’ critical thinking for flexible transfer and application in real problem-solving situations. This meta-analysis shows that mixed course teaching has a highly substantial impact on the cultivation and promotion of learners’ critical thinking. Therefore, teachers should design and implement mixed course teaching with real collaborative problem-solving situations in combination with the knowledge content of specific disciplines in conventional teaching, teach methods and strategies of critical thinking based on poorly structured problems to help students master critical thinking, and provide practical activities in which students can interact with each other to develop knowledge construction and critical thinking utilizing the approach of collaborative problem-solving.

Third, teachers should be more trained in critical thinking, particularly preservice teachers, and they also should be conscious of the ways in which teachers’ support for learning scaffolds can promote critical thinking. The learning scaffold supported by teachers had the greatest impact on learners’ critical thinking, in addition to being more directive, targeted, and timely (Wood et al., 2006 ). Critical thinking can only be effectively taught when teachers recognize the significance of critical thinking for students’ growth and use the proper approaches while designing instructional activities (Forawi, 2016 ). Therefore, with the intention of enabling teachers to create learning scaffolds to cultivate learners’ critical thinking utilizing the approach of collaborative problem solving, it is essential to concentrate on the teacher-supported learning scaffolds and enhance the instruction for teaching critical thinking to teachers, especially preservice teachers.

Implications and limitations

There are certain limitations in this meta-analysis, but future research can correct them. First, the search languages were restricted to English and Chinese, so it is possible that pertinent studies that were written in other languages were overlooked, resulting in an inadequate number of articles for review. Second, these data provided by the included studies are partially missing, such as whether teachers were trained in the theory and practice of critical thinking, the average age and gender of learners, and the differences in critical thinking among learners of various ages and genders. Third, as is typical for review articles, more studies were released while this meta-analysis was being done; therefore, it had a time limit. With the development of relevant research, future studies focusing on these issues are highly relevant and needed.

Conclusions

The subject of the magnitude of collaborative problem-solving’s impact on fostering students’ critical thinking, which received scant attention from other studies, was successfully addressed by this study. The question of the effectiveness of collaborative problem-solving in promoting students’ critical thinking was addressed in this study, which addressed a topic that had gotten little attention in earlier research. The following conclusions can be made:

Regarding the results obtained, collaborative problem solving is an effective teaching approach to foster learners’ critical thinking, with a significant overall effect size (ES = 0.82, z  = 12.78, P  < 0.01, 95% CI [0.69, 0.95]). With respect to the dimensions of critical thinking, collaborative problem-solving can significantly and effectively improve students’ attitudinal tendency, and the comprehensive effect is significant (ES = 1.17, z  = 7.62, P  < 0.01, 95% CI [0.87, 1.47]); nevertheless, it falls short in terms of improving students’ cognitive skills, having only an upper-middle impact (ES = 0.70, z  = 11.55, P  < 0.01, 95% CI [0.58, 0.82]).

As demonstrated by both the results and the discussion, there are varying degrees of beneficial effects on students’ critical thinking from all seven moderating factors, which were found across 36 studies. In this context, the teaching type (chi 2  = 7.20, P  < 0.05), intervention duration (chi 2  = 12.18, P  < 0.01), subject area (chi 2  = 13.36, P  < 0.05), group size (chi 2  = 8.77, P  < 0.05), and learning scaffold (chi 2  = 9.03, P  < 0.01) all have a positive impact on critical thinking, and they can be viewed as important moderating factors that affect how critical thinking develops. Since the learning stage (chi 2  = 3.15, P  = 0.21 > 0.05) and measuring tools (chi 2  = 0.08, P  = 0.78 > 0.05) did not demonstrate any significant intergroup differences, we are unable to explain why these two factors are crucial in supporting the cultivation of critical thinking in the context of collaborative problem-solving.

Data availability

All data generated or analyzed during this study are included within the article and its supplementary information files, and the supplementary information files are available in the Dataverse repository: https://doi.org/10.7910/DVN/IPFJO6 .

Bensley DA, Spero RA (2014) Improving critical thinking skills and meta-cognitive monitoring through direct infusion. Think Skills Creat 12:55–68. https://doi.org/10.1016/j.tsc.2014.02.001

Article   Google Scholar  

Castle A (2009) Defining and assessing critical thinking skills for student radiographers. Radiography 15(1):70–76. https://doi.org/10.1016/j.radi.2007.10.007

Chen XD (2013) An empirical study on the influence of PBL teaching model on critical thinking ability of non-English majors. J PLA Foreign Lang College 36 (04):68–72

Google Scholar  

Cohen A (1992) Antecedents of organizational commitment across occupational groups: a meta-analysis. J Organ Behav. https://doi.org/10.1002/job.4030130602

Cooper H (2010) Research synthesis and meta-analysis: a step-by-step approach, 4th edn. Sage, London, England

Cindy HS (2004) Problem-based learning: what and how do students learn? Educ Psychol Rev 51(1):31–39

Duch BJ, Gron SD, Allen DE (2001) The power of problem-based learning: a practical “how to” for teaching undergraduate courses in any discipline. Stylus Educ Sci 2:190–198

Ennis RH (1989) Critical thinking and subject specificity: clarification and needed research. Educ Res 18(3):4–10. https://doi.org/10.3102/0013189x018003004

Facione PA (1990) Critical thinking: a statement of expert consensus for purposes of educational assessment and instruction. Research findings and recommendations. Eric document reproduction service. https://eric.ed.gov/?id=ed315423

Facione PA, Facione NC (1992) The California Critical Thinking Dispositions Inventory (CCTDI) and the CCTDI test manual. California Academic Press, Millbrae, CA

Forawi SA (2016) Standard-based science education and critical thinking. Think Skills Creat 20:52–62. https://doi.org/10.1016/j.tsc.2016.02.005

Halpern DF (2001) Assessing the effectiveness of critical thinking instruction. J Gen Educ 50(4):270–286. https://doi.org/10.2307/27797889

Hu WP, Liu J (2015) Cultivation of pupils’ thinking ability: a five-year follow-up study. Psychol Behav Res 13(05):648–654. https://doi.org/10.3969/j.issn.1672-0628.2015.05.010

Huber K (2016) Does college teach critical thinking? A meta-analysis. Rev Educ Res 86(2):431–468. https://doi.org/10.3102/0034654315605917

Kek MYCA, Huijser H (2011) The power of problem-based learning in developing critical thinking skills: preparing students for tomorrow’s digital futures in today’s classrooms. High Educ Res Dev 30(3):329–341. https://doi.org/10.1080/07294360.2010.501074

Kuncel NR (2011) Measurement and meaning of critical thinking (Research report for the NRC 21st Century Skills Workshop). National Research Council, Washington, DC

Kyndt E, Raes E, Lismont B, Timmers F, Cascallar E, Dochy F (2013) A meta-analysis of the effects of face-to-face cooperative learning. Do recent studies falsify or verify earlier findings? Educ Res Rev 10(2):133–149. https://doi.org/10.1016/j.edurev.2013.02.002

Leng J, Lu XX (2020) Is critical thinking really teachable?—A meta-analysis based on 79 experimental or quasi experimental studies. Open Educ Res 26(06):110–118. https://doi.org/10.13966/j.cnki.kfjyyj.2020.06.011

Liang YZ, Zhu K, Zhao CL (2017) An empirical study on the depth of interaction promoted by collaborative problem solving learning activities. J E-educ Res 38(10):87–92. https://doi.org/10.13811/j.cnki.eer.2017.10.014

Lipsey M, Wilson D (2001) Practical meta-analysis. International Educational and Professional, London, pp. 92–160

Liu Z, Wu W, Jiang Q (2020) A study on the influence of problem based learning on college students’ critical thinking-based on a meta-analysis of 31 studies. Explor High Educ 03:43–49

Morris SB (2008) Estimating effect sizes from pretest-posttest-control group designs. Organ Res Methods 11(2):364–386. https://doi.org/10.1177/1094428106291059

Article   ADS   Google Scholar  

Mulnix JW (2012) Thinking critically about critical thinking. Educ Philos Theory 44(5):464–479. https://doi.org/10.1111/j.1469-5812.2010.00673.x

Naber J, Wyatt TH (2014) The effect of reflective writing interventions on the critical thinking skills and dispositions of baccalaureate nursing students. Nurse Educ Today 34(1):67–72. https://doi.org/10.1016/j.nedt.2013.04.002

National Research Council (2012) Education for life and work: developing transferable knowledge and skills in the 21st century. The National Academies Press, Washington, DC

Niu L, Behar HLS, Garvan CW (2013) Do instructional interventions influence college students’ critical thinking skills? A meta-analysis. Educ Res Rev 9(12):114–128. https://doi.org/10.1016/j.edurev.2012.12.002

Peng ZM, Deng L (2017) Towards the core of education reform: cultivating critical thinking skills as the core of skills in the 21st century. Res Educ Dev 24:57–63. https://doi.org/10.14121/j.cnki.1008-3855.2017.24.011

Reiser BJ (2004) Scaffolding complex learning: the mechanisms of structuring and problematizing student work. J Learn Sci 13(3):273–304. https://doi.org/10.1207/s15327809jls1303_2

Ruggiero VR (2012) The art of thinking: a guide to critical and creative thought, 4th edn. Harper Collins College Publishers, New York

Schellens T, Valcke M (2006) Fostering knowledge construction in university students through asynchronous discussion groups. Comput Educ 46(4):349–370. https://doi.org/10.1016/j.compedu.2004.07.010

Sendag S, Odabasi HF (2009) Effects of an online problem based learning course on content knowledge acquisition and critical thinking skills. Comput Educ 53(1):132–141. https://doi.org/10.1016/j.compedu.2009.01.008

Sison R (2008) Investigating Pair Programming in a Software Engineering Course in an Asian Setting. 2008 15th Asia-Pacific Software Engineering Conference, pp. 325–331. https://doi.org/10.1109/APSEC.2008.61

Simpson E, Courtney M (2002) Critical thinking in nursing education: literature review. Mary Courtney 8(2):89–98

Stewart L, Tierney J, Burdett S (2006) Do systematic reviews based on individual patient data offer a means of circumventing biases associated with trial publications? Publication bias in meta-analysis. John Wiley and Sons Inc, New York, pp. 261–286

Tiwari A, Lai P, So M, Yuen K (2010) A comparison of the effects of problem-based learning and lecturing on the development of students’ critical thinking. Med Educ 40(6):547–554. https://doi.org/10.1111/j.1365-2929.2006.02481.x

Wood D, Bruner JS, Ross G (2006) The role of tutoring in problem solving. J Child Psychol Psychiatry 17(2):89–100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x

Wei T, Hong S (2022) The meaning and realization of teachable critical thinking. Educ Theory Practice 10:51–57

Xu EW, Wang W, Wang QX (2022) A meta-analysis of the effectiveness of programming teaching in promoting K-12 students’ computational thinking. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11445-2

Yang YC, Newby T, Bill R (2008) Facilitating interactions through structured web-based bulletin boards: a quasi-experimental study on promoting learners’ critical thinking skills. Comput Educ 50(4):1572–1585. https://doi.org/10.1016/j.compedu.2007.04.006

Yore LD, Pimm D, Tuan HL (2007) The literacy component of mathematical and scientific literacy. Int J Sci Math Educ 5(4):559–589. https://doi.org/10.1007/s10763-007-9089-4

Zhang T, Zhang S, Gao QQ, Wang JH (2022) Research on the development of learners’ critical thinking in online peer review. Audio Visual Educ Res 6:53–60. https://doi.org/10.13811/j.cnki.eer.2022.06.08

Download references

Acknowledgements

This research was supported by the graduate scientific research and innovation project of Xinjiang Uygur Autonomous Region named “Research on in-depth learning of high school information technology courses for the cultivation of computing thinking” (No. XJ2022G190) and the independent innovation fund project for doctoral students of the College of Educational Science of Xinjiang Normal University named “Research on project-based teaching of high school information technology courses from the perspective of discipline core literacy” (No. XJNUJKYA2003).

Author information

Authors and affiliations.

College of Educational Science, Xinjiang Normal University, 830017, Urumqi, Xinjiang, China

Enwei Xu, Wei Wang & Qingxia Wang

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Enwei Xu or Wei Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary tables, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Xu, E., Wang, W. & Wang, Q. The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature. Humanit Soc Sci Commun 10 , 16 (2023). https://doi.org/10.1057/s41599-023-01508-1

Download citation

Received : 07 August 2022

Accepted : 04 January 2023

Published : 11 January 2023

DOI : https://doi.org/10.1057/s41599-023-01508-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Impacts of online collaborative learning on students’ intercultural communication apprehension and intercultural communicative competence.

  • Hoa Thi Hoang Chau
  • Hung Phu Bui
  • Quynh Thi Huong Dinh

Education and Information Technologies (2024)

Exploring the effects of digital technology on deep learning: a meta-analysis

Sustainable electricity generation and farm-grid utilization from photovoltaic aquaculture: a bibliometric analysis.

  • A. A. Amusa
  • M. Alhassan

International Journal of Environmental Science and Technology (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

critical thinking skills google scholar

Inquiry and critical thinking skills for the next generation: from artificial intelligence back to human intelligence

  • Jonathan Michael Spector   ORCID: orcid.org/0000-0002-6270-3073 1 &
  • Shanshan Ma 1  

Smart Learning Environments volume  6 , Article number:  8 ( 2019 ) Cite this article

31k Accesses

53 Citations

32 Altmetric

Metrics details

Along with the increasing attention to artificial intelligence (AI), renewed emphasis or reflection on human intelligence (HI) is appearing in many places and at multiple levels. One of the foci is critical thinking. Critical thinking is one of four key 21st century skills – communication, collaboration, critical thinking and creativity. Though most people are aware of the value of critical thinking, it lacks emphasis in curricula. In this paper, we present a comprehensive definition of critical thinking that ranges from observation and inquiry to argumentation and reflection. Given a broad conception of critical thinking, a developmental approach beginning with children is suggested as a way to help develop critical thinking habits of mind. The conclusion of this analysis is that more emphasis should be placed on developing human intelligence, especially in young children and with the support of artificial intelligence. While much funding and support goes to the development of artificial intelligence, this should not happen at the expense of human intelligence. Overall, the purpose of this paper is to argue for more attention to the development of human intelligence with an emphasis on critical thinking.

Introduction

In recent decades, advancements in Artificial Intelligence (AI) have developed at an incredible rate. AI has penetrated into people’s daily life on a variety of levels such as smart homes, personalized healthcare, security systems, self-service stores, and online shopping. One notable AI achievement was when AlphaGo, a computer program, defeated the World Go Champion Mr. Lee Sedol in 2016. In the previous year, AlphaGo won in a competition against a professional Go player (Silver et al. 2016 ). As Go is one of the most challenging games, the wins of AI indicated a breakthrough. Public attention has been further drawn to AI since then, and AlphaGo continues to improve. In 2017, a new version of AlphaGo beat Ke Jie, the current world No.1 ranking Go player. Clearly AI can manage high levels of complexity.

Given many changes and multiple lines of development and implement, it is somewhat difficult to define AI to include all of the changes since the 1980s (Luckin et al. 2016 ). Many definitions incorporate two dimensions as a starting point: (a) human-like thinking, and (b) rational action (Russell and Norvig 2009 ). Basically, AI is a term used to label machines (computers) that imitate human cognitive functions such as learning and problem solving, or that manage to deal with complexity as well as human experts.

AlphaGo’s wins against human players were seen as a comparison between artificial and human intelligence. One concern is that AI has already surpassed HI; other concerns are that AI will replace humans in some settings or that AI will become uncontrollable (Epstein 2016 ; Fang et al. 2018 ). Scholars worry that AI technology in the future might trigger the singularity (Good 1966 ), a hypothesized future that the development of technology becomes uncontrollable and irreversible, resulting in unfathomable changes to human civilization (Vinge 1993 ).

The famous theoretical physicist Stephen Hawking warned that AI might end mankind, yet the technology he used to communicate involved a basic form of AI (Cellan-Jones 2014 ). This example highlights one of the basic dilemmas of AI – namely, what are the overall benefits of AI versus its potential drawbacks, and how to move forward given its rapid development? Obviously, basic or controllable AI technologies are not what people are afraid of. Spector et al. 1993 distinguished strong AI and weak AI. Strong AI involves an application that is intended to replace an activity performed previously by a competent human, while weak AI involves an application that aims to enable a less experienced human to perform at a much higher level. Other researchers categorize AI into three levels: (a) artificial narrow intelligence (Narrow AI), (b) artificial general intelligence (General AI), and (c) artificial super intelligence (Super AI) (Siau and Yang 2017 ; Zhang and Xie 2018 ). Narrow AI, sometimes called weak AI, refers to a computer that focus on a narrow task such as AlphaZero or a self-driving car. General AI, sometimes referred to as strong AI, is the simulation of human-level intelligence, which can perform more cognitive tasks as well as most humans do. Super AI is defined by Bostrom ( 1998 ) as “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills” (p.1).

Although the consequence of singularity and its potential benefits or harm to the human race have been intensely debated, an undeniable fact is that AI is capable of undertaking recursive self-improvement. With the increasing improvement of this capability, more intelligent generations of AI will appear rapidly. On the other hand, HI has its own limits and its development requires continuous efforts and investment from generation to generation. Education is the main approach humans use to develop and improve HI. Given the extraordinary growth gap between AI and HI, eventually AI can surpass HI. However, that is no reason to neglect the development and improvement of HI. In addition, in contrast to the slow development rate of HI, the growth of funding support to AI has been rapidly increasing according to the following comparison of support for artificial and human intelligence.

The funding support for artificial and human intelligence

There are challenges in comparing artificial and human intelligence by identifying funding for both. Both terms are somewhat vague and can include a variety of aspects. Some analyses will include big data and data analytics within the sphere of artificial intelligence and others will treat them separately. Some will include early childhood developmental research within the sphere of support for HI and others treat them separately. Education is a major way of human beings to develop and improve HI. The investments in education reflect the efforts put on the development of HI, and they pale in comparison with investments in AI.

Sources also vary from governmental funding of research and development to business and industry investments in related research and development. Nonetheless, there are strong indications of increased funding support for AI in North America, Europe and Asia, especially in China. The growth in funding for AI around the world is explosive. According to ZDNet, AI funding more than doubled from 2016 to 2017 and more than tripled from 2016 to 2018. The growth in funding for AI in the last 10 years has been exponential. According to Venture Scanner, there are approximately 2500 companies that have raised $60 billion in funding from 3400 investors in 72 different countries (see https://www.slideshare.net/venturescanner/artificial-intelligence-q1-2019-report-highlights ). Areas included in the Venture Scanner analysis included virtual assistants, recommendation engines, video recognition, context-aware computing, speech recognition, natural language processing, machine learning, and more.

The above data on AI funding focuses primarily on companies making products. There is no direct counterpart in the area of HI where the emphasis is on learning and education. What can be seen, however, are trends within each area. The above data suggest exponential growth in support for AI. In contrast, according to the Urban Institute, per-student funding in the USA has been relatively flat for nearly two decades, with a few states showing modest increases and others showing none (see http://apps.urban.org/features/education-funding-trends/ ). Funding for education is complicated due to the various sources. In the USA, there are local, state and federal sources to consider. While that mixture of funding sources is complex, it is clear that federal and state spending for education in the USA experienced an increase after World War II. However, since the 1980s, federal spending for education has steadily declined, and state spending on education in most states has declined since 2010 according to a government report (see https://www.usgovernmentspending.com/education_spending ). This decline in funding reflects the decreasing emphasis on the development of HI, which is a dangerous signal.

Decreased support for education funding in the USA is not typical of what is happening in other countries, according to The Hechinger Report (see https://hechingerreport.org/rest-world-invests-education-u-s-spends-less/ ). For example, in the period of 2010 to 2014, American spending on elementary and high school education declined 3%, whereas in the same period, education spending in the 35 countries in the OECD rose by 5% with some countries experiencing very significant increases (e.g., 76% in Turkey).

Such data can be questioned in terms of how effectively funds are being spent or how poorly a country was doing prior to experiencing a significant increase. However, given the performance of American students on the Program for International Student Assessment (PISA), the relative lack of funding support in the USA is roughly related with the mediocre performance on PISA tests (see https://nces.ed.gov/surveys/pisa/pisa2015/index.asp ). Research by Darling-Hammond ( 2014 ) indicated that in order to improve learning and reduce the achievement gap, systematic government investments in high-need schools would be more effective if the focus was on capacity building, improving the knowledge and skills of educators and the quality of curriculum opportunities.

Though HI could not be simply defined by the performance on PISA test, improving HI requires systematic efforts and funding support in high-need areas as well. So, in the following section, we present a reflection on HI.

Reflection on human intelligence

Though there is a variety of definitions of HI, from the perspective of psychology, according to Sternberg ( 1999 ), intelligence is a form of developing expertise, from a novice or less experienced person to an expert or more experienced person, a student must be through multiple learning (implicit and explicit) and thinking (critical and creative) processes. In this paper, we adopted such a view and reflected on HI in the following section by discussing learning and critical thinking.

What is learning?

We begin with Gagné’s ( 1985 ) definition of learning as characterized by stable and persistent changes in what a person knows or can do. How do humans learn? Do you recall how to prove that the square root of 2 is not a rational number, something you might have learned years ago? The method is intriguing and is called an indirect proof or a reduction to absurdity – assume that the square root of 2 is a rational number and then apply truth preserving rules to arrive at a contradiction to show that the square root of 2 cannot be a rational number. We recommend this as an exercise for those readers who have never encountered that method of learning and proof. (see https://artofproblemsolving.com/wiki/index.php/Proof_by_contradiction ). Yet another interesting method of learning is called the process of elimination, sometimes accredited to Arthur Conan Doyle’s ( 1926 ) in The Adventure of the Blanched Soldier – Sherlock Holmes says to Dr. Watson that the process of elimination “starts upon the supposition that when you have eliminated all which is impossible, that whatever remains, however improbable, must be the truth ” (see https://www.dfw-sherlock.org/uploads/3/7/3/8/37380505/1926_november_the_adventure_of_the_blanched_soldier.pdf ).

The reason to mention Sherlock Holmes early in this paper is to emphasize the role that observation plays in learning. The character Sherlock Holmes was famous for his observation skills that led to his so-called method of deductive reasoning (a process of elimination), which is what logicians would classify as inductive reasoning as the conclusions of that reasoning process are primarily probabilistic rather than certain, unlike the proof of the irrationality of the square root of 2 mentioned previously.

In dealing with uncertainty, it seems necessary to make observations and gather evidence that can lead one to a likely conclusion. Is that not what reasonable people and accomplished detectives do? It is certainly what card counters do at gambling houses; they observe high and low value cards that have already been played in order to estimate the likelihood of the next card being a high or low value card. Observation is a critical process in dealing with uncertainty.

Moreover, humans typically encounter many uncertain situations in the course of life. Few people encounter situations which require resolution using a mathematical proof such as the one with which this article began. Jonassen ( 2000 , 2011 ) argued that problem solving is one of the most important and frequent activities in which people engage. Moreover, many of the more challenging problems are ill-structured in the sense that (a) there is incomplete information pertaining to the situation, or (b) the ideal resolution of the problem is unknown, or (c) how to transform a problematic situation into an acceptable situation is unclear. In short, people are confronted with uncertainty nearly every day and in many different ways. The so called key 21st century skills of communication, collaboration, critical thinking and creativity (the 4 Cs; see http://www.battelleforkids.org/networks/p21 ) are important because uncertainty is a natural and inescapable aspect of the human condition. The 4 Cs are interrelated and have been presented by Spector ( 2018 ) as interrelated capabilities involving logic and epistemology in the form of the new 3Rs – namely, re-examining, reasoning, and reflecting. Re-examining is directly linked to observation as a beginning point for inquiry. The method of elimination is one form of reasoning in which a person engages to solve challenging problems. Reflecting on how well one is doing in the life-long enterprise of solving challenging problems is a higher kind of meta-cognitive activity in which accomplished problem-solvers engage (Ericsson et al. 1993 ; Flavell 1979 ).

Based on these initial comments, a comprehensive definition of critical thinking is presented next in the form of a framework.

A framework of critical thinking

Though there is variety of definitions of critical thinking, a concise definition of critical thinking remains elusive. For delivering a direct understanding of critical thinking to readers such as parents and school teachers, in this paper, we present a comprehensive definition of critical thinking through a framework that includes many of the definitions offered by others. Critical thinking, as treated broadly herein, is a multi-dimensioned and multifaceted human capability. Critical thinking has been interpreted from three perspectives: education, psychology, and epistemology, all of which are represented in the framework that follows.

In a developmental approach to critical thinking, Spector ( 2019 ) argues that critical thinking involves a series of cumulative and related abilities, dispositions and other variables (e.g., motivation, criteria, context, knowledge). This approach proceeds from experience (e.g., observing something unusual) and then to various forms of inquiry, investigation, examination of evidence, exploration of alternatives, argumentation, testing conclusions, rethinking assumptions, and reflecting on the entire process.

Experience and engagement are ongoing throughout the process which proceeds from relatively simple experiences (e.g., direct and immediate observation) to more complex interactions (e.g., manipulation of an actual or virtual artifact and observing effects).

The developmental approach involves a variety of mental processes and non-cognitive states, which help a person’s decision making to become purposeful and goal directed. The associated critical thinking skills enable individuals to be likely to achieve a desired outcome in a challenging situation.

In the process of critical thinking, apart from experience, there are two additional cognitive capabilities essential to critical thinking – namely, metacognition and self-regulation . Many researchers (e.g., Schraw et al. 2006 ) believe that metacognition has two components: (a) awareness and understanding of one’s own thoughts, and (b) the ability to regulate one’s own cognitive processes. Some other researchers put more emphasis on the latter component. For example, Davies ( 2015 ) described metacognition as the capacity to monitor the quality of one’s thinking process, and then to make appropriate changes. However, the American Psychology Association (APA) defines metacognition as an awareness and understanding of one’s own thought with the ability to control related cognitive processes (see https://psycnet.apa.org/record/2008-15725-005 ).

Although the definition and elaboration of these two concepts deserve further exploration, they are often used interchangeably (Hofer and Sinatra 2010 ; Schunk 2008 ). Many psychologists see the two related capabilities of metacognition and self-regulation as being closely related - two sides on one coin, so to speak. Metacognition involves or emphasizes awareness, whereas self-regulation involves and emphasizes appropriate control. These two concepts taken together enable a person to create a self-regulatory mechanism, which monitors and regulates the corresponding skills (e.g., observation, inquiry, interpretation, explanation, reasoning, analysis, evaluation, synthesis, reflection, and judgement).

As to the critical thinking skills, it should be noted that there is much discussion about the generalizability and domain specificity of them, just as there is about problem-solving skills in general (Chi et al. 1982 ; Chiesi et al. 1979 ; Ennis 1989 ; Fischer 1980 ). The research supports the notion that to achieve high levels of expertise and performance, one must develop high levels of domain knowledge. As a consequence, becoming a highly effective critical thinker in a particular domain of inquiry requires significant domain knowledge. One may achieve such levels in a domain in which one has significant domain knowledge and experience but not in a different domain in which one has little domain knowledge and experience. The processes involved in developing high levels of critical thinking are somewhat generic. Therefore, it is possible to develop critical thinking in nearly any domain when the two additional capabilities of metacognition and self-regulation are coupled with motivation and engagement and supportive emotional states (Ericsson et al. 1993 ).

Consequently, the framework presented here (see Fig. 1 ) is built around three main perspectives about critical thinking (i.e., educational, psychological and epistemological) and relevant learning theories. This framework provides a visual presentation of critical thinking with four dimensions: abilities (educational perspective), dispositions (psychological perspective), levels (epistemological perspective) and time. Time is added to emphasize the dynamic nature of critical thinking in terms of a specific context and a developmental approach.

figure 1

Critical thinking often begins with simple experiences such as observing a difference, encountering a puzzling question or problem, questioning someone’s statement, and then leads, in some instances to an inquiry, and then to more complex experiences such as interactions and application of higher order thinking skills (e.g., logical reasoning, questioning assumptions, considering and evaluating alternative explanations).

If the individual is not interested in what was observed, an inquiry typically does not begin. Inquiry and critical thinking require motivation along with an inquisitive disposition. The process of critical thinking requires the support of corresponding internal indispositions such as open-mindedness and truth-seeking. Consequently, a disposition to initiate an inquiry (e.g., curiosity) along with an internal inquisitive disposition (e.g., that links a mental habit to something motivating to the individual) are both required (Hitchcock 2018 ). Initiating dispositions are those that contribute to the start of inquiry and critical thinking. Internal dispositions are those that initiate and support corresponding critical thinking skills during the process. Therefore, critical thinking dispositions consist of initiating dispositions and internal dispositions. Besides these factors, critical thinking also involves motivation. Motivation and dispositions are not mutually exclusive, for example, curiosity is a disposition and also a motivation.

Critical thinking abilities and dispositions are two main components of critical thinking, which involve such interrelated cognitive constructs as interpretation, explanation, reasoning, evaluation, synthesis, reflection, judgement, metacognition and self-regulation (Dwyer et al. 2014 ; Davies 2015 ; Ennis 2018 ; Facione 1990 ; Hitchcock 2018 ; Paul and Elder 2006 ). There are also some other abilities such as communication, collaboration and creativity, which are now essential in current society (see https://en.wikipedia.org/wiki/21st_century_skills ). Those abilities along with critical thinking are called the 4Cs; they are individually monitored and regulated through metacognitive and self-regulation processes.

The abilities involved in critical thinking are categorized in Bloom’s taxonomy into higher order skills (e.g., analyzing and synthesizing) and lower level skills (e.g., remembering and applying) (Anderson and Krathwohl 2001 ; Bloom et al. 1956 ).

The thinking process can be depicted as a spiral through both lower and higher order thinking skills. It encompasses several reasoning loops. Some of them might be iterative until a desired outcome is achieved. Each loop might be a mix of higher order thinking skills and lower level thinking skills. Each loop is subject to the self-regulatory mechanism of metacognition and self-regulation.

But, due to the complexity of human thinking, a specific spiral with reasoning loops is difficult to represent. Therefore, instead of a visualized spiral with an indefinite number of reasoning loops, the developmental stages of critical thinking are presented in the diagram (Fig. 1 ).

Besides, most of the definitions of critical thinking are based on the imagination about ideal critical thinkers such as the consensus generated from the Delphi report (Facione 1990 ). However, according to Dreyfus and Dreyfus ( 1980 ), in the course of developing an expertise, students would pass through five stages. Those five stages are “absolute beginner”, “advanced beginner”, “competent performer”, “proficient performer,” and “intuitive expert performer”. Dreyfus and Dreyfus ( 1980 ) described the five stages the result of the successive transformations of four mental functions: recollection, recognition, decision making, and awareness.

In the course of developing critical thinking and expertise, individuals will pass through similar stages which are accompanied with the increasing practices and accumulation of experience. Through the intervention and experience of developing critical thinking, as a novice, tasks are decomposed into context-free features which could be recognized by students without the experience of particular situations. For further improving, students need to be able to monitor their awareness, and with a considerable experience. They can note recurrent meaningful component patterns in some contexts. Gradually, increased practices expose students to a variety of whole situations which enable the students to recognize tasks in a more holistic manner as a professional. On the other hand, with the increasing accumulation of experience, individuals are less likely to depend simply on abstract principles. The decision will turn to something intuitive and highly situational as well as analytical. Students might unconsciously apply rules, principles or abilities. A high level of awareness is absorbed. At this stage, critical thinking is turned into habits of mind and in some cases expertise. The description above presents a process of critical thinking development evolving from a novice to an expert, eventually developing critical thinking into habits of mind.

We mention the five-stage model proposed by Dreyfus and Dreyfus ( 1980 ) to categorize levels of critical thinking and emphasize the developmental nature involved in becoming a critical thinker. Correspondingly, critical thinking is categorized into 5 levels: absolute beginner (novice), advanced beginner (beginner), competent performer (competent), proficient performer (proficient), and intuitive expert (expert).

Ability level and critical thinker (critical thinking) level together represent one of the four dimensions represented in Fig. 1 .

In addition, it is noteworthy that the other two elements of critical thinking are the context and knowledge in which the inquiry is based. Contextual and domain knowledge must be taken into account with regard to critical thinking, as previously argued. Besides, as Hitchcock ( 2018 ) argued, effective critical thinking requires knowledge about and experience applying critical thinking concepts and principles as well.

Critical thinking is considered valuable across disciplines. But except few courses such as philosophy, critical thinking is reported lacking in most school education. Most of researchers and educators thus proclaim that integrating critical thinking across the curriculum (Hatcher 2013 ). For example, Ennis ( 2018 ) provided a vision about incorporating critical thinking across the curriculum in higher education. Though people are aware of the value of critical thinking, few of them practice it. Between 2012 and 2015, in Australia, the demand of critical thinking as one of the enterprise skills for early-career job increased 125% (Statista Research Department, 2016). According to a survey across 1000 adults by The Reboot Foundation 2018 , more than 80% of respondents believed that critical thinking skills are lacking in today’s youth. Respondents were deeply concerned that schools do not teach critical thinking. Besides, the investigation also found that respondents were split over when and how to teach critical thinking, clearly.

In the previous analysis of critical thinking, we presented the mechanism of critical thinking instead of a concise definition. This is because, given the various perspectives of interpreting critical thinking, it is not easy to come out with an unitary definition, but it is essential for the public to understand how critical thinking works, the elements it involves and the relationships between them, so they can achieve an explicit understanding.

In the framework, critical thinking starts from simple experience such as observing a difference, then entering the stage of inquiry, inquiry does not necessarily turn the thinking process into critical thinking unless the student enters a higher level of thinking process or reasoning loops such as re-examining, reasoning, reflection (3Rs). Being an ideal critical thinker (or an expert) requires efforts and time.

According to the framework, simple abilities such as observational skills and inquiry are indispensable to lead to critical thinking, which suggests that paying attention to those simple skills at an early stage of children can be an entry point to critical thinking. Considering the child development theory by Piaget ( 1964 ), a developmental approach spanning multiple years can be employed to help children develop critical thinking at each corresponding development stage until critical thinking becomes habits of mind.

Although we emphasized critical thinking in this paper, for the improvement of intelligence, creative thinking and critical thinking are separable, they are both essential abilities that develop expertise, eventually drive the improvement of HI at human race level.

As previously argued, there is a similar pattern among students who think critically in different domains, but students from different domains might perform differently in creativity because of different thinking styles (Haller and Courvoisier 2010 ). Plus, students have different learning styles and preferences. Personalized learning has been the most appropriate approach to address those differences. Though the way of realizing personalized learning varies along with the development of technologies. Generally, personalized learning aims at customizing learning to accommodate diverse students based on their strengths, needs, interests, preferences, and abilities.

Meanwhile, the advancement of technology including AI is revolutionizing education; students’ learning environments are shifting from technology-enhanced learning environments to smart learning environments. Although lots of potentials are unrealized yet (Spector 2016 ), the so-called smart learning environments rely more on the support of AI technology such as neural networks, learning analytics and natural language processing. Personalized learning is better supported and realized in a smart learning environment. In short, in the current era, personalized learning is to use AI to help learners perform at a higher level making adjustments based on differences of learners. This is the notion with which we conclude – the future lies in using AI to improve HI and accommodating individual differences.

The application of AI in education has been a subject for decades. There are efforts heading to such a direction though personalized learning is not technically involved in them. For example, using AI technology to stimulate critical thinking (Zhu 2015 ), applying a virtual environment for building and assessing higher order inquiry skills (Ketelhut et al. 2010 ). Developing computational thinking through robotics (Angeli and Valanides 2019 ) is another such promising application of AI to support the development of HI.

However, almost all of those efforts are limited to laboratory experiments. For accelerating the development rate of HI, we argue that more emphasis should be given to the development of HI at scale with the support of AI, especially in young children focusing on critical and creative thinking.

In this paper, we argue that more emphasis should be given to HI development. Rather than decreasing the funding of AI, the analysis of progress in artificial and human intelligence indicates that it would be reasonable to see increased emphasis placed on using various AI techniques and technologies to improve HI on a large and sustainable scale. Well, most researchers might agree that AI techniques or the situation might be not mature enough to support such a large-scale development. But it would be dangerous if HI development is overlooked. Based on research and theory drawn from psychology as well as from epistemology, the framework is intended to provide a practical guide to the progressive development of inquiry and critical thinking skills in young children as children represent the future of our fragile planet. And we suggested a sustainable development approach for developing inquiry and critical thinking (See, Spector 2019 ). Such an approach could be realized through AI and infused into HI development. Besides, a project is underway in collaboration with NetDragon to develop gamified applications to develop the relevant skills and habits of mind. A game-based assessment methodology is being developed and tested at East China Normal University that is appropriate for middle school children. The intention of the effort is to refocus some of the attention on the development of HI in young children.

Availability of data and materials

Not applicable.

Abbreviations

Artificial Intelligence

Human Intelligence

L.W. Anderson, D.R. Krathwohl, A taxonomy for learning, teaching, and assessing: A revision of bloom’s taxonomy of educational objectives (Allyn & Bacon, Boston, 2001)

Google Scholar  

Angeli, C., & Valanides, N. (2019). Developing young children’s computational thinking with educational robotics: An interaction effect between gender and scaffolding strategy. Comput. Hum. Behav. Retrieved from https://doi.org/10.1016/j.chb.2019.03.018

B.S. Bloom, M.D. Engelhart, E.J. Furst, W.H. Hill, D.R. Krathwohl, Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive Domain (David McKay Company, New York, 1956)

Bostrom, N. (1998). How long before superintelligence? Retrieved from https://nickbostrom.com/superintelligence.html

R. Cellan-Jones, Stephen hawking warns artificial intelligence could end mankind. BBC. News. 2 , 2014 (2014)

M.T.H. Chi, R. Glaser, E. Rees, in Advances in the Psychology of Human Intelligence , ed. by R. S. Sternberg. Expertise in problem solving (Erlbaum, Hillsdale, 1982), pp. 7–77

H.L. Chiesi, G.J. Spliich, J.F. Voss, Acquisition of domain-related information in relation to high and low domain knowledge. J. Verbal Learn. Verbal Behav. 18 , 257–273 (1979)

Article   Google Scholar  

L. Darling-Hammond, What can PISA tell US about US education policy? N. Engl. J. Publ. Policy. 26 (1), 4 (2014)

M. Davies, in Higher education: Handbook of theory and research . A Model of Critical Thinking in Higher Education (Springer, Cham, 2015), pp. 41–92

Chapter   Google Scholar  

A.C. Doyle, in The Strand Magazine . The adventure of the blanched soldier (1926) Retrieved from https://www.dfw-sherlock.org/uploads/3/7/3/8/37380505/1926_november_the_adventure_of_the_blanched_soldier.pdf

S.E. Dreyfus, H.L. Dreyfus, A five-stage model of the mental activities involved in directed skill acquisition (no. ORC-80-2) (University of California-Berkeley Operations Research Center, Berkeley, 1980)

Book   Google Scholar  

C.P. Dwyer, M.J. Hogan, I. Stewart, An integrated critical thinking framework for the 21st century. Think. Skills Creat. 12 , 43–52 (2014)

R.H. Ennis, Critical thinking and subject specificity: Clarification and needed research. Educ. Res. 18 , 4–10 (1989)

R.H. Ennis, Critical thinking across the curriculum: A vision. Topoi. 37 (1), 165–184 (2018)

Epstein, Z. (2016). Has artificial intelligence already surpassed the human brain? Retrieved from https://bgr.com/2016/03/10/alphago-beats-lee-sedol-again/

K.A. Ericsson, R.T. Krampe, C. Tesch-Römer, The role of deliberate practice in the acquisition of expert performance. Psychol. Rev. 100 (3), 363–406 (1993)

Facione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction [Report for the American Psychology Association]. Retrieved from https://files.eric.ed.gov/fulltext/ED315423.pdf

J. Fang, H. Su, Y. Xiao, Will Artificial Intelligence Surpass Human Intelligence? (2018). https://doi.org/10.2139/ssrn.3173876

K.W. Fischer, A theory of cognitive development: The control and construction of hierarchies of skills. Psychol. Rev. 87 , 477–431 (1980)

J.H. Flavell, Metacognition and cognitive monitoring: A new area of cognitive development inquiry. Am. Psychol. 34 (10), 906–911 (1979)

R.M. Gagné, The conditions of learning and theory of instruction , 4th edn. (Holt, Rinehart, & Winston, New York, 1985)

I.J. Good, Speculations concerning the first ultraintelligent machine. Adv Comput. 6 , 31-88 (1966)

C.S. Haller, D.S. Courvoisier, Personality and thinking style in different creative domains. Psychol. Aesthet. Creat. Arts. 4 (3), 149 (2010)

D.L. Hatcher, Is critical thinking across the curriculum a plausible goal? OSSA. 69 (2013) Retrieved from https://scholar.uwindsor.ca/ossaarchive/OSSA10/papersandcommentaries/69

Hitchcock, D. (2018). Critical thinking. Retrieved from https://plato.stanford.edu/entries/critical-thinking/

B.K. Hofer, G.M. Sinatra, Epistemology, metacognition, and self-regulation: Musings on an emerging field. Metacogn. Learn. 5 (1), 113–120 (2010)

D.H. Jonassen, Toward a design theory of problem solving. Educ. Technol. Res. Dev. 48 (4), 63–85 (2000)

D.H. Jonassen, Learning to Solve Problems: A Handbook for Designing Problem-Solving Learning Environments (Routledge, New York, 2011)

D.J. Ketelhut, B.C. Nelson, J. Clarke, C. Dede, A multi-user virtual environment for building and assessing higher order inquiry skills in science. Br. J. Educ. Technol. 41 (1), 56–68 (2010)

R. Luckin, W. Holmes, M. Griffiths, L.B. Forcier, Intelligence Unleashed: An Argument for AI in Education (Pearson Education, London, 2016) Retrieved from http://oro.open.ac.uk/50104/1/Luckin%20et%20al.%20-%202016%20-%20Intelligence%20Unleashed.%20An%20argument%20for%20AI%20in%20Educ.pdf

R. Paul, L. Elder, The miniature guide to critical thinking: Concepts and tools , 4th edn. (2006) Retrieved from https://www.criticalthinking.org/files/Concepts_Tools.pdf

J. Piaget, Part I: Cognitive development in children: Piaget development and learning. J. Res. Sci. Teach. 2 (3), 176–186 (1964)

S.J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach , 3rd edn. (Prentice Hall, Upper Saddle River, 2009) ISBN 978-0-136042594

G. Schraw, K.J. Crippen, K. Hartley, Promoting self-regulation in science education: Metacognition as part of a broader perspective on learning. Res. Sci. Educ. 36 (1–2), 111–139 (2006)

D.H. Schunk, Metacognition, self-regulation, and self-regulated learning: Research recommendations. Educ. Psychol. Rev. 20 (4), 463–467 (2008)

K. Siau, Y. Yang, in Twelve Annual Midwest Association for Information Systems Conference (MWAIS 2017) . Impact of artificial intelligence, robotics, and machine learning on sales and marketing (2017), pp. 18–19

D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., Mastering the game of Go with deep neural networks and tree search. Nature. 529 (7587), 484 (2016)

J. M. Spector, M. C. Polson, D. J. Muraida (eds.), Automating Instructional Design: Concepts and Issues (Educational Technology Publications, Englewood Cliffs, 1993)

J.M. Spector, Smart Learning Environments: Concepts and Issues . In G. Chamblee & L. Langub (Eds.), Proceedings of Society for Information Technology & Teacher Education International Conference (pp. 2728–2737). (Association for the Advancement of Computing in Education (AACE), Savannah, GA, United States, 2016). Retrieved June 4, 2019 from https://www.learntechlib.org/primary/p/172078/ .

J. M. Spector, Thinking and learning in the anthropocene: The new 3 Rs . Discussion paper presented at the International Big History Association Conference, Philadelphia, PA (2018). Retrieved from http://learndev.org/dl/HLAIBHA2018/Spector%2C%20J.%20M.%20(2018).%20Thinking%20and%20Learning%20in%20the%20Anthropocene.pdf .

J. M. Spector, Complexity, Inquiry Critical Thinking, and Technology: A Holistic and Developmental Approach . In Mind, Brain and Technology (pp. 17–25). (Springer, Cham, 2019).

R.J. Sternberg, Intelligence as developing expertise. Contemp. Educ. Psychol. 24 (4), 359–375 (1999)

The Reboot Foundation. (2018). The State of Critical Thinking: A New Look at Reasoning at Home, School, and Work. Retrieved from https://reboot-foundation.org/wp-content/uploads/_docs/REBOOT_FOUNDATION_WHITE_PAPER.pdf

V. Vinge, The Coming Technological Singularity: How to Survive in the Post-Human Era . Resource document. NASA Technical report server. Retrieved from https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19940022856.pdf . Accessed 20 June 2019.

D. Zhang, M. Xie, Artificial Intelligence’s Digestion and Reconstruction for Humanistic Feelings . In 2018 International Seminar on Education Research and Social Science (ISERSS 2018) (Atlantis Press, Paris, 2018)

X. Zhu, in Twenty-Ninth AAAI Conference on Artificial Intelligence . Machine Teaching: An Inverse Problem to Machine Learning and an Approach toward Optimal Education (2015)

Download references

Acknowledgements

We wish to acknowledge the generous support of NetDragon and the Digital Research Centre at the University of North Texas.

Initial work is being funded through the NetDragon Digital Research Centre at the University of North Texas with Author as the Principal Investigator.

Author information

Authors and affiliations.

Department of Learning Technologies, University of North Texas Denton, Texas, TX, 76207, USA

Jonathan Michael Spector & Shanshan Ma

You can also search for this author in PubMed   Google Scholar

Contributions

The authors contributed equally to the effort. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Jonathan Michael Spector .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Spector, J.M., Ma, S. Inquiry and critical thinking skills for the next generation: from artificial intelligence back to human intelligence. Smart Learn. Environ. 6 , 8 (2019). https://doi.org/10.1186/s40561-019-0088-z

Download citation

Received : 06 June 2019

Accepted : 27 August 2019

Published : 11 September 2019

DOI : https://doi.org/10.1186/s40561-019-0088-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Critical thinking
  • Developmental model
  • Human intelligence
  • Inquiry learning

critical thinking skills google scholar

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The effectiveness of problem based learning in improving critical thinking, problem-solving and self-directed learning in first-year medical students: A meta-analysis

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

¶ ‡ IBAPM are sole first author to this work.

Affiliations College of Medicine, Taipei Medical University, Taipei, Taiwan, Medical and Health Education Development, Faculty of Medicine, Udayana University, Bali, Indonesia

ORCID logo

Roles Conceptualization, Formal analysis, Methodology, Supervision, Validation, Writing – review & editing

Affiliation Department of Education and Humanities in Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan

Roles Conceptualization, Methodology, Supervision, Validation, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Education and Humanities in Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan, Department of Urology, Taipei Medical University Hospital, Taipei, Taiwan

  • Ida Bagus Amertha Putra Manuaba, 
  • Yi -No, 
  • Chien-Chih Wu

PLOS

  • Published: November 22, 2022
  • https://doi.org/10.1371/journal.pone.0277339
  • Peer Review
  • Reader Comments

9 May 2024: Manuaba IBAP, -No Y, Wu CC (2024) Correction: The effectiveness of problem based learning in improving critical thinking, problem-solving and self-directed learning in first-year medical students: A meta-analysis. PLOS ONE 19(5): e0303724. https://doi.org/10.1371/journal.pone.0303724 View correction

Fig 1

The adaptation process for first-year medical students is an important problem because it significantly affects educational activities. The previous study showed that 63% of students had difficulties adapting to the learning process in their first year at medical school. Therefore, students need the most suitable learning style to support the educational process, such as Problem-based learning (PBL). This method can improve critical thinking skills, problem-solving and self-directed learning. Although PBL has been adopted in medical education, the effectiveness of PBL in first-year medical students is still not yet clear. The purpose of this meta-analysis is to verify whether the PBL approach has a positive effect in improving knowledge, problem-solving and self-directed learning in first-year medical students compared with the conventional method.

We searched PubMed, ScienceDirect, Cochrane, and Google Scholar databases until June 5, 2021. Search terms included problem-based learning, effectiveness, effectivity, and medical student. We excluded studies with the final-year medical student populations. All analyses in our study were carried out using Review Manager version 5.3 (RevMan Cochrane, London, UK).

Seven eligible studies (622 patients) were included. The pooled analysis demonstrated no significant difference between PBL with conventional learning method in critical thinking/knowledge assessment (p = 0.29), problem-solving aspect (p = 0.47), and self-directed learning aspect (p = 0.34).

The present study concluded that the PBL approach in first-year medical students appeared to be ineffective in improving critical thinking/knowledge, problem-solving, and self-directed compared with the conventional teaching method.

Citation: Manuaba IBAP, -No Y, Wu C-C (2022) The effectiveness of problem based learning in improving critical thinking, problem-solving and self-directed learning in first-year medical students: A meta-analysis. PLoS ONE 17(11): e0277339. https://doi.org/10.1371/journal.pone.0277339

Editor: Huijuan Cao, Beijing University of Chinese Medicine, CHINA

Received: July 14, 2021; Accepted: October 25, 2022; Published: November 22, 2022

Copyright: © 2022 Manuaba et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The adaptation process for first-year medical students is an important problem because it is one of the factors that significantly affect educational outcomes [ 1 ]. Struggling can occur at any time, but first-year students are particularly susceptible as they adapt to new learning methods at university [ 2 ]. A study on the adaptation process of first-year medical students involving 200 participants showed that 63% of students had problems adapting to the learning process [ 3 ]. Consequently, students need to know the most suitable learning style to support the educational process. In addition, the appropriate learning approach can also help the adaptation process of first-year medical students and maximize their study outcomes. Therefore, educational institutions need to ensure that applied learning methods improve the learning atmosphere for first-year medical students [ 4 ].

Problem-based learning (PBL) encourages students to identify their knowledge and skills to achieve specific goals [ 5 ]. Many studies have evaluated the effectiveness of PBL in the medical curriculum and found that PBL can improve understanding, team performance, learning motivation, student satisfaction, and critical thinking [ 5 , 6 ]. The PBL method not only helps students to understand in-depth, but it also encourages independent learning in students because they have to formulate their own learning goals after understanding PBL scenarios, solve their problems via literatures and internet, compare scenarios with theories from various sources and actively participate in group discussions [ 7 ]. PBL has three main learning objectives, namely (1) to apply deep content learning, (2) to apply problem analysis skills and develop solutions to solve problems, and (3) to apply self-directed learning as an approach to adapt learning styles [ 8 ]. Therefore, this teaching model has been highly praised in medical education courses in the past two decades [ 9 ]. In conventional lecture methods, students are passively exposed to the material and less likely to learn or apply concepts actively. Meanwhile, in PBL, students will learn actively using case-based peer-to-peer teaching, stimulating students to learn based on lecture materials and independent learning to solve cases under the guidance of a facilitator. The PBL approach aims to promote the integration of learned knowledge, rather than simply implanting knowledge and skills compared with the conventional teaching model [ 8 ] and also has been design to emphasizes active participation, problem-solving, and critical thinking skills compared to conventional medical education practices [ 6 ].

Several reports have showed the effectiveness of PBL for the first-year medical students in improving the final score with the help of map concept compared to PBL only group. The average score was improved significantly, namely 10.07±3.49 versus 5.97±2.09, p<0.001 [ 10 ]. Another study compared the final score between the PBL method and the conventional method accompanied by a workshop for first-year medical students. The final results were also statistically significant, namely 8.25±0.79 versus 5.46±0.96, p<0.01 [ 11 ]. However, due to the limitation of the studies, the effect of PBL for first-year medical students is yet to be concluded. Also, there is still no meta-analysis that evaluates this topic to date. Therefore, we conducted a systematic review and meta-analysis to verify whether the PBL approach has a positive effect in improving knowledge/critical thinking, problem-solving and self-directed learning in first-year medical students compared with the conventional method.

Study design

A Meta-analysis was performed from March to June 2021 to assess the effectiveness of PBL in improving knowledge/critical thinking, problem-solving and self-directed learning in first-year medical students. To attain our goal, potentially relevant papers were identified and collected from PubMed, Cochrane, ScienceDirect, and Google Scholar to calculate the mean difference and 95% confidence interval (95%CI) using a random and fixed-effect model. We used meta-analysis protocols as a guide in our present study [ 12 ].

Search strategy

We conducted a systematic search in PubMed, Cochrane, ScienceDirect and Google Scholar for search strategy up to June 5, 2021. The search strategy conformed to medical subjects heading (MeSH), involving the use of a combination of the following keywords: (Problem-based Learning [MeSH Major Topic]) AND (effectiveness OR effectivity AND medical student AND first-year). Language constraints were applied in our quest policy. We only used the bigger sample size analysis, which was up to date when we saw the same results in the experiments. We also scanned the possible papers of the appropriate or qualifying studies reference list by searching "Articles linked”. Two independent inspectors found potentially vital records (I.B.A.P.M, Y.N). Disagreements between two independent researchers related to the article were settled by a debate and/or consultation with the senior investigator for finding the third opinion (C.C.W).

Eligibility criteria and data extraction

The inclusion criteria for this study included: (1) research subjects were medical students at the first year (first or second semester), (2) study that evaluated the knowledge/critical thinking, problem-solving and self-directed learning of the student, (3) study that provided sufficient data for calculation of mean difference and 95%CI, p-value, and study heterogeneity. Meanwhile, the exclusion criteria were as follows: (1) studies with insufficient data, (2) samples size less than 50, (3) intervention duration less than one year, (4) review, letter to the editor, and comments articles. Data extraction was conducted by two authors (I.BA.P.M, Y.N). Both of those authors independently screened the collected article’s title, abstract, and full text. Two reviewers extracted the data, which was then extracted to Google Spreadsheet by two reviewers (I.BA.P.M, Y.N). Information was derived from each article included in this study as follows: (1) first author’s name and year of release, (2) age of the participant, (3) interventional and control method, (4) sample cases and control sizes, (5) country of study, (6) study program, (7) duration of PBL intervention, (8) score of PBL and control group. Two independent authors carried out data extraction to prevent human mistakes. If there were a disagreement, a discussion would be held to discuss the solution.

Quality assessment

Two independent authors (I.BA.P.M, Y.N) assessed the quality of the studies to ensure each sample’s validity and prevent the possible exaggeration of each study. The authors use major and minor criteria in assessing the risk of bias for quality assessment. There were four major and four minor criteria. The authors assigned 2 points each to the major criteria and 1 point each to the minor criteria so that the total score would be 12 points. If the article got 9–12 points, then it assigned as “low-risk bias,” if the article got 6–8 points, then it assigned as medium risk bias”, and if the article got < 5 points, then it assigned as “high risk of bias”. When there was a disagreement between the two authors, a discussion was held. If the conflict has not been settled, the two authors discuss it with the third author (C.C.W).

Statistical analysis

Assessment of Methodological Quality of Individual Trials in each article was assessed at the risk of bias before enrolling in meta-analysis. The Z-test was used to assess the effectivity learning method from self-directed learning and its sub-group analysis, critical thinking/knowledge, and problem-solving. Forest plots defined the group measurement and impact estimate. Heterogeneity was provided by using several parameters that we provide, such as Chi 2 , Tau 2 , and I 2 . In the beginning, Comprehensive Meta-Analysis (CMA, New Jersey, US) version 2.1. was used to assess effect models. If the p-value was less than 0.10, the random-effect model was used to evaluate heterogeneity. In contrast, a fixed-effect model was used if the P-value > 0.10. Our study’s analyses were carried out using Review Manager version 5.3 (RevMan Cochrane, London, UK) and Comprehensive Meta-Analysis (CMA, New Jersey, US) version 2.1.

Literature searching

This systematic review and meta-analysis extracted articles from four databases: PubMed, Cochrane, ScienceDirect, and Google Scholar. We found 5536 articles for identification. There was 11 article record removed before screening due to duplication. In the first step screening, there were 5407 articles excluded due to a mismatch of the titles and abstracts. Thus, 120 articles were recorded and continued to the next screening. From 120 articles, the full text was not available for 39 articles. Then, 81 articles were assessed for eligibility according to the inclusion and exclusion criteria and bias quality. There were several articles excluded as follows: no information about duration intervention (n = 16), low sample size (<50 samples) (n = 17), not appropriate study method (n = 12), and insufficient data (n = 29). Finally, seven articles were enrolled in this review ( Fig 1 ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

From : Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). P referred R eporting I tems for S ystematic Reviews and M eta- A nalyses: The PRISMA Statement PLoS Med 6(7): e1000097. doi: 10.1371/journal.pmed1000097 For more information, visit www.prisma-statement.org .

https://doi.org/10.1371/journal.pone.0277339.g001

Baseline characteristic involves the study and quality assessment

All of the studies were published within the last 20 years, and most were in Asia. The sample sizes of the seven studies ranged from 56 to 131 participants, and the pooled sample size was classified into two groups (PBL vs. conventional learning methods). The participants were in the first year of medical student major (three articles), dentist major (one article), nurse major (two articles), and midwife major (one article). The length of the intervention varied from several months to one year. No specific gender was evaluated ( Table 1 ). All of enrolling studies were in various study types. According to our assessment, two studies had a low risk of bias (score range 9–12 points), and the remaining articles had a medium risk of bias (score range 6–8 points).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.t001

The effectiveness comparison of PBL and conventional learning method

The critical thinking/knowledge evaluation..

In our finding conventional learning method consist of the conventional method (two articles), LBL (lecture base learning) (two articles), tutorial learning group (one article), and theory-based discussion (one article). Three studies show a higher PBL pre-test score, and three studies show a higher conventional method pre-test score. Meanwhile, there was not much difference in the mean score of each group in the pre-test. Evaluation post-test score after the intervention was found to be improved in each group. Post-test scores among PBL groups were mostly higher than the conventional group, except Choi et al ., study. It is also in line with Choi et al . ’s conclusion that stated no significant finding. In addition, Lohman et al . study also found that different teaching methods did not significantly influence students’ knowledge ( Table 2 ).

thumbnail

This table provides pre/post-test results in each group and the authors’ interpretation of their funding.

https://doi.org/10.1371/journal.pone.0277339.t002

The problem-solving evaluation.

Two articles investigated the critical thinking or knowledge aspect and problem-solving. There was not much difference in the average value of the pre-test and post-test result; meanwhile, Lohman et al .; found a significant association between the learning method and problem-solving aspect. Meanwhile, Choi et al . did not find a significant difference in each learning method in the problem-solving aspect ( Table 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.t003

The self-directed learning evaluation.

Three articles evaluated self-directed learning. In the Lohman et al . study, the course instructor assessed the student and scored self-directed learning. The higher score obtained, the better level of self-directed learning. Unfortunately, no significant difference was found in comparing learning methods to enhance self-directed learning in all the included studies ( Table 4 ).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.t004

Meta-analysis assessment

Our meta-analysis assessment classified three groups: critical thinking/knowledge, problem-solving, and self-directed learning.

Critical thinking/knowledge assessment.

Six articles evaluated the critical thinking/knowledge in conventional and PBL groups. This analysis used random effect due to p-value of heterogeneity <0,10. The heterogeneity of these articles was evaluated by using the I 2 parameter. According to ReVMan analysis, we established I 2 was 93%. It belonged to 75% to 100% classification that had good heterogeneity. We found that for developing critical thinking, PBL was a better program. Unfortunately, there is no significant difference between PBL and conventional learning methods (p = 0.29) ( Fig 2 ). This section had a sub-group analysis according to duration intervention and Asia’s critical thinking aspect ( Fig 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.g002

thumbnail

(A) Fixed effect models. (B) Random effect models.

https://doi.org/10.1371/journal.pone.0277339.g003

Moreover, the critical thinking studies were regrouped according to the duration of the intervention (≤ 6 months vs > 6 months ) and countries (Asia vs. western). The analysis of the duration intervention found no significant difference between PBL and conventional learning methods. Subgroup analysis was assessed by using random effect and fixed-effect models. The studies with the duration intervention at more than six months and learning method comparison in western countries subgroup were found to have low heterogeneity ((I 2 = 0% (might not be important)). However, high heterogeneity scores were found in the studies with duration intervention less than six months (I 2 = 96%) and learning method comparison in Asian countries subgroup (I 2 = 95%). We discovered no statistical difference between PBL and conventional learning methods in each group even though the test for each subgroup analysis’s overall effect from the forest plot graph (diamond) is more inclined to the PBL ( Fig 3 ).

Problem-solving.

We found two studies that discussed the problem-solving aspect between PBL and conventional learning methods. Both studies had good heterogeneity (I 2 = 86% (considerable heterogeneity). The overall results were analyzed by using random effects. It is more toward the conventional teaching for enhancing problem-solving skills, but it was not statistically significant ( Fig 4 ).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.g004

Self-directed learning.

Self-directed learning was evaluated by using a fixed-effect model. The heterogeneity by using the I 2 parameter has shown no heterogeneity (0% = might be unimportant)—the overall effect was more inclined toward the conventional method for enhancing self-directed learning. However, there was no statistical difference (p = 0.34) ( Fig 5 ).

thumbnail

https://doi.org/10.1371/journal.pone.0277339.g005

Problem Based Learning is a learning method developed to be used as a solution to conventional learning methods that have been used in various disciplines, one of which is health science. Problem Based Learning is a learning method that emphasizes the active participation of students in solving and solving a given problem, both in group and individual settings, so that it can improve students’ skills in analyzing and solving problems [ 5 , 6 ].

Various studies have been conducted regarding the effectiveness of PBL to be applied in the teaching and learning process [ 13 , 16 , 18 ]. Several factors may influence the implementation of PBL, such as the number of years of study from students, the material taught, and the field of knowledge pursued by students. According to the critical thinking/ knowledge aspect, we found no significant difference between the conventional learning method group and the PBL group (p = 0.29). This finding likely resulted from the lack of association between PBL in enhancing critical thinking/knowledge in the majority of the study. Three studies showed insignificant results from six studies analyzed, and only Tripathi’s (2015) [ 18 ] has a linear result with our hypothesis. Accordingly, Choi et al . stated that their insignificant (p = 0.7) finding was due to a short amount of time of the intervention to produce any meaningful effects [ 17 ]. Therefore, intervention duration might not be an absolute factor of PBL effectiveness, as found by Tripathi [ 18 ]. Likewise, this study also had the shortest intervention duration but still found significant results. Moreover, research conducted by Li et al. related to critical thinking showed a significant difference between the experimental and control groups (p < 0.001) [ 20 ]. Then, Tseng et al., also reported a significant difference in critical thinking scores between the experimental and control groups, where the experimental group had a higher score (p < 0.0001) [ 21 ].

The research sample characteristics can also affect the PBL results. In this meta-analysis, we analyzed the medical students’ data in their first year. First-year students often experience obstacles in adapting to lecture methods that are different from high school teaching methods [ 1 ]. This problem is influenced by various factors, one of which is the difference in lecture methods in each institution. Adaptation to new environments and habits is also a challenge for medical students in the first year. Adaptation to learning methods is a process of response in terms of mental and individual behavior to a demand from the individual or a formal task related to academic work. Therefore, students familiar with the teacher-centered method tend to face difficulty applying the student-centered with PBL method in higher education. They also tend to experience challenges in accepting the study materials, which impact the teaching and learning process in the first semester of lectures for medical students [ 22 ]. Those factors explained above could also affect problem-solving and self-directed learning.

Other aspects besides critical thinking/knowledge of the PBL are problem-solving and self-directed learning. We found that PBL is not superior to conventional learning in enhancing problem-solving (p = 0.47). It might be due to the limited studies that assessed this issue and included in this study. The problem-solving aspect was only analyzed in two studies, and they have different results. Choi et al. [ 17 ] had a higher total sample, and the study also had a higher weight analysis (56.9%) compared to Lohman et al. [ 13 ]. Therefore the results will tend to follow Choi et al. (insignificant finding) [ 17 ], besides several aspects as explained above.

Similar results with problem-solving aspect, PBL also failed to show any superiority in increasing self-directed learning compared to the conventional learning method. Two studies in this aspect had shown insignificant results, such as Lohman et al. (2002) [ 13 ] and Choi et al., [ 17 ]. However, different findings were reported Hayashi et al. (2013) [ 16 ]. According to the baseline characteristic of the study, Hayashi’s study had a longer duration of intervention than Lohman et al. (2002) [ 13 ] and Choi’s [ 17 ] studies. Thus, it might impact the results because the study subjects were exposed to the intervention much longer, so the desired effect was seen [ 16 ]. The PBL learning system that focuses on increasing the active participation of students is expected to be able to improve those aspects compared to using the conventional approach. Research by Tseng et al., 2011 involving 120 nursing students (51 in the experimental group, 69 in the control group) showed a significant difference in self-directed learning scores, where the experimental group had a higher mean value than the control group (p < 0.0001) [ 21 ]. Three aspects of PBL were evaluated in this meta-analysis, and none were significant. Unfortunately, the specific aspect that might impact the result did not mention or explained in each study in detail.

The problem-based learning method has been used widely, and to the best of our knowledge, further investigation about this learning method is needed. The strength of this study was that our meta-analysis evaluated the specific outcome of PBL such as critical thinking/knowledge assessment, problem-solving, and self-directed learning. Several studies discuss the PBL effect on general learning outcomes and specific backgrounds [ 9 , 14 , 18 , 19 , 23 ]. Our meta-analysis not only provided pre-test and post-test scores in each group, but we also explained the outcome in each study. Furthermore, we noted that high levels of heterogeneity across studies were found in this meta-analysis. Factors that may cause heterogeneity include the sample from different countries with different backgrounds. Second, the instrument used to evaluate the PBL progression in each study was different. Third, the duration of intervention was also varied, bringing different outcomes. All of these factors may contribute to our meta-analysis heterogeneity. Subgroup analysis has been conducted to minimize the heterogeneity. This method can only reduce the heterogeneity in terms of the critical thinking/acknowledgment aspect, especially when the duration of intervention was more than six months and when the learning method was compared in the Western country sub-group. Meanwhile, no effect was found in terms of heterogeneity when duration of intervention was less than six months, and the learning method was conducted in the Asian countries sub-group. It might be due to several factors that have been pointed out above. Unfortunately, we cannot run subgroup analyses due to limited studies discussing this topic.

Additionally, we believe that further primary study is needed to evaluate the effectiveness of PBL. A multicenter approach is suggested as the most appropriate method to identify the cumulative effect and the difference between geographic areas or races. Moreover, researchers can also compare between educational centers as well as the impact of culture and technological progress of the local area in the implementation of PBL due to the rarity of the study regarding these topics. Psychological aspects also need to be discussed because medical students in the first year may still have the learning method from high school, potentially affecting the PBL.

In conclusion, according to our analysis, PBL is not superior to conventional learning in improving critical thinking/knowledge, problem-solving and self-directed learning in first-year medical students. In addition, our meta-analysis had several limitations, such as only evaluating the learning outcomes in the first year, and no studies were found with multiyear approach. We could not equate the instruments used in PBL and did not evaluate specifically based on the study program. We also could not assess the socio-demography that might contribute to their learning process, particularly their social culture. Therefore, a multicenter approach is suggested as the most appropriate method to identify the cumulative effect and the difference between geographic areas or races.

Supporting information

https://doi.org/10.1371/journal.pone.0277339.s001

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 13. Lohman MC, Finkelstein M. to foster problem-solving skill. 2002;121–7.
  • Prospective students
  • International students
  • PhD candidates
  • Professors and researchers
  • Institutions and companies
  • Doctoral school
  • Ph.D. programmes
  • Admissions to Ph.D. programmes
  • Internazionalization
  • Ph.D.s and Companies
  • News and calls

Soft Skills courses

Courses on soft skills help doctoral candidates to develop their personal, professional and managerial skills.

Soft skills that prepare doctoral candidates to meet the needs of the labour market include:

  • flexibility and adaptability in the workplace; ability to address work challenges;
  • having the tools to manage change, develop innovation, work ethically with entrepreneurial spirit;
  • developing problem solving skills in unstructured situations, critical reasoning and creative thinking;
  • interacting with others, working in teams, working in open, multicultural and flexible environments, negotiating, managing conflicts in organizational contexts;
  • developing leadership skills, decision making and emotional intelligence;
  • mastering the tools for communication, dissemination and public speaking;
  • knowing how to make use of resources, optimizing time, managing projects;
  • managing career development and seizing professional opportunities.

To this end, the Doctoral School has organized its catalogue of soft skills courses in four pathways for skills development in relation to the different job sectors.These four pathways serve as a “bridge” between scientific education and professional training. By doing so, they can knowingly meet the requirement of taking at least 40 hours of soft skills courses over the three-year period of their PhD programme.

E-learning / MOC

Research and Academia

Specific competences.

Research ethics, Integrity and Impact

Research Quality

Research Dissemination

Research Financing

General competences

Personal competences.

Individual Skills

Career Development

Industry, companies and professionals

Organization and Decision Processes

Critical Reasoning/ Problem Solving

Entrepreneurship and start-ups

Enterprenership

Public sector and public organizations

IMAGES

  1. Critical Thinking Skills

    critical thinking skills google scholar

  2. Critical_Thinking_Skills_Diagram_svg

    critical thinking skills google scholar

  3. why is Importance of Critical Thinking Skills in Education

    critical thinking skills google scholar

  4. why is Importance of Critical Thinking Skills in Education

    critical thinking skills google scholar

  5. What is critical thinking?

    critical thinking skills google scholar

  6. How to promote Critical Thinking Skills

    critical thinking skills google scholar

VIDEO

  1. What is Critical Thinking?

  2. Using Brain Teasers to Build Critical Thinking Skills

  3. Critical Thinking Skills: A Process for Better Problem Solving and Decision Making

  4. 5 Essential Critical Thinking Skills For Making Good Decisions

  5. “Critical Thinking Skills” by David Sotir

  6. Critical Thinking Lecture: an introduction to critical thinking

COMMENTS

  1. Teaching Critical Thinking and Problem Solving Skills

    We would like to show you a description here but the site won't allow us.

  2. Critical Thinking: A Model of Intelligence for Solving Real-World

    4. Critical Thinking as an Applied Model for Intelligence. One definition of intelligence that directly addresses the question about intelligence and real-world problem solving comes from Nickerson (2020, p. 205): "the ability to learn, to reason well, to solve novel problems, and to deal effectively with novel problems—often unpredictable—that confront one in daily life."

  3. Bridging critical thinking and transformative learning: The role of

    By contrast, using critical thinking skills in a way that results in transformative learning will likely include a state of doubt as a pivotal stage in the process. ... Google Scholar. Baron J (1995) Myside bias in thinking about abortion. Thinking and Reasoning 1(3): 221-235. Crossref. Google Scholar.

  4. Metacognitive Strategies and Development of Critical Thinking in Higher

    Finally, for critical thinking skills, the results show significant differences in the scale total and in the five factors regarding the measurement time, ... Dillon Beach, CA: Foundation for Critical Thinking [Google Scholar] Paul R., Elder A. D. (2006). Critical Thinking. Learn the Tools the Best Thinkers Use.

  5. Enhancing students' critical thinking skills: is comparing ...

    There is a need for effective methods to teach critical thinking (CT). One instructional method that seems promising is comparing correct and erroneous worked examples (i.e., contrasting examples). The aim of the present study, therefore, was to investigate the effect of contrasting examples on learning and transfer of CT-skills, focusing on avoiding biased reasoning. Students (N = 170 ...

  6. Frontiers

    Scientific thinking is the ability to generate, test, and evaluate claims, data, and theories (e.g., Bullock et al., 2009; Koerber et al., 2015 ). Simply stated, the basic tenets of scientific thinking provide students with the tools to distinguish good information from bad. Students have access to nearly limitless information, and the skills ...

  7. Trends and hotspots in critical thinking research over the past two

    1. Introduction. Critical thinking is a high-order thinking activity for "deciding what to believe or do" [1].It comprises skills of interpretation, analysis, evaluation, inference, explanation, self-regulation, inquisitiveness, self-confidence, open-mindedness, prudence, and the like [2].Critical thinking was interpreted as seven definitional strands: judgment, skepticism, originality ...

  8. Educating Critical Thinkers: The Role of Epistemic Cognition

    Definitions of critical thinking also include a skill component, which is the ability to interpret, analyze, evaluate, and infer, even when meanings and significance are not immediately apparent, as well as the ability to stay focused on the task at hand (Abrami et al., 2015; Facione, 1990).For example, to analyze the quality of an argument, one must make inferences about its author, and ...

  9. Redefining Critical Thinking: Teaching Students to Think like

    Scientific thinking is the ability to generate, test, and evaluate claims, data, and theories (e.g., Bullock et al., 2009; Koerber et al., 2015 ). Simply stated, the basic tenets of scientific thinking provide students with the tools to distinguish good information from bad. Students have access to nearly limitless information, and the skills ...

  10. Strategies for Teaching Students to Think Critically:

    Google Scholar. Allen M., Berkowitz S., Hunt S., Louden A. (1997, November). ... Critical thinking skills instruction for postsecondary students with and without learning disabilities: The effectiveness of icons as part of a literature curriculum (Doctoral dissertation). Available from ProQuest Dissertations and Theses database.

  11. Fostering critical thinking: Features of powerful learning environments

    We propose that the proposed educational protocol—aligned to Critical Thinking education goals, conditions and interventions—can be used for fostering critical thinking. More specifically, the use of four types of interventions are recommended: (1) modelling, (2) inducing, (3) declaring and (4) surveillance.

  12. The effectiveness of collaborative problem solving in promoting

    Bensley DA, Spero RA (2014) Improving critical thinking skills and meta-cognitive monitoring through direct infusion. Think Skills Creat 12:55-68. ... Article ADS Google Scholar ...

  13. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  14. Inquiry and critical thinking skills for the next generation: from

    As to the critical thinking skills, it should be noted that there is much discussion about the generalizability and domain specificity of them, just as there is about problem-solving skills in general ... Google Scholar Angeli, C., & Valanides, N. (2019). Developing young children's computational thinking with educational robotics: An ...

  15. The role of critical thinking skills and learning styles of university

    The California critical thinking skills test (form B) includes 34 multiple choice questions with one correct answer in five different areas of critical thinking skills, including evaluation, inference, analysis, inductive reasoning and deductive reasoning. The answering time was 45 minutes and the final score is 34 and the achieved score in ...

  16. Teaching Critical Thinking: Focusing on Metacognitive Skills and

    To become a better critical thinker, one not only must develop expert thinking skills but also become an expert at choosing the best skills for the particular situation. These two components of critical thinking can be described as maximizing the efficiency and accuracy of one's cognitive and metacognitive skills for successful actions.

  17. A Model of Critical Thinking as an Important Attribute for Success in

    Those students need strong critical thinking skills which are essential to get to the root of problems and find reasonable solutions. A model of critical thinking is designed to help those students to develop their thinking skills and prepare for a global, complex society. ... Google Scholar. Brown, 2004. H.D. Brown. Some practical thoughts ...

  18. Constructivism learning theory: A paradigm for students' critical

    Abstract. This study looks at whether creativity and critical thinking help students solve problems and improve their grades by mediating the link between 21 st century skills (learning motivation, cooperativity, and interaction with peers, engagement with peers, and a smart classroom environment). The mediating relationship between creativity and critical thinking was discovered using ...

  19. Critical Thinking

    The Nature of Critical Thinking. Critical Thinking: Skills/Abilities and Dispositions. Critical Thinking and the Problem of Generalizability. The Relationship Between Critical Thinking and Creative Thinking "Critical Thinking" and Other Terms Referring to Thinking. Critical Thinking and Education. Critiques of Critical Thinking. Conclusion

  20. Active Learning Strategies to Promote Critical Thinking

    The Bloom Taxonomy 25 is a hierarchy of thinking skills that ranges from simple skills, such as knowledge, to complex thinking, such as evaluation. Depending on the initial words used in the question, students can be challenged at different levels of cognition. ... Santa Rosa, CA: Foundation for Critical Thinking; 1995. [Google Scholar] 5 ...

  21. The effectiveness of problem based learning in improving critical

    This method can improve critical thinking skills, problem-solving and self-directed learning. Although PBL has been adopted in medical education, the effectiveness of PBL in first-year medical students is still not yet clear. ... Cochrane, and Google Scholar databases until June 5, 2021. Search terms included problem-based learning ...

  22. Soft Skills courses

    Critical Reasoning/ Problem Solving. Creativity and idea generation techniques Design Thinking, Processes and Methods Goal definition and problem structuring Thinking out of the box. Enterprenership. Entrepreneurial Finance Entrepreneurship and start-up creation Management of industrial and R&D projects. General competences. Management

  23. Taking critical thinking, creativity and grit online

    Among these skills, critical thinking is important not only at work, where problem solving is essential, but also in any social setting where adequate decision making is required (Dwyer and Walsh 2020). ... [Google Scholar] Chang CY, Kao CH, Hwang GJ, Lin FH. From experiencing to critical thinking: A contextual game-based learning approach to ...

  24. Critical Thinking: The Development of an Essential Skill for Nursing

    2. CRITICAL THINKING SKILLS. Nurses in their efforts to implement critical thinking should develop some methods as well as cognitive skills required in analysis, problem solving and decision making ().These skills include critical analysis, introductory and concluding justification, valid conclusion, distinguishing facts and opinions to assess the credibility of sources of information ...