Textbook vs. Textbot: Investigating ChatGPT’s Role in Retrieval Practice

By: Morgan Lewis PharmD Candidate, Sara Stallworth PharmD, and Jeff Cain EdD, MS

Introduction

Nestled in library corners, pharmacy students sit hunched over desks, meticulously poring over their class notes, portraying a picture-perfect scene of “traditional studying.” Yet this romanticized version of studying may not be the best for learning.

Students often use input-based study strategies, which can involve reading notes or rewatching video recordings, which have a goal of “cementing” the information into the brain. Conversely, retrieval practice involves actively recalling information from memory through some form of self-assessment.^1,2 Although retrieval-based practice is generally a more effective strategy to significantly enhance long-term retention for learning, students often neglect it in favor of more passive input-based study techniques such as re-reading material or cramming.^3,4

Perhaps the emergence of freely available and relatively easy-to-use artificial intelligence (AI) tools can help students increase adoption of retrieval-based practices. So, using a retrieval-based practice framework we decided to test ChatGPT’s ability to create practice questions on pharmacy topics, grade those questions, and provide feedback to learners.

What we did

First, our authorship team created a general prompt that could then be customized for three separate trial scenarios in ChatGPT-3.5 :

“Create a quiz on [insert topic here] for a [insert class here] at the level of a first-year pharmacy student with 10 questions and check my answers. Present them to me one at a time. After I have answered, tell me if I’m correct or incorrect, show me the correct answer and provide the rationale. At the end give me a percentage grade and suggest ways I can improve.”

The three different trials, each assigned to one of the project authors were:

Create questions from pharmacy management class notes copied and pasted into ChatGPT-3.5.
Create questions based on a treatment guideline/algorithm (Advanced Cardiac Life Support recommendations published by American Heart Association)
Create questions based on a general course topic (pharmacy management employment law) using publicly available resources.

In each trial, the authors purposefully answered three questions incorrectly to test the AI’s scoring ability and feedback quality. ChatGPT’s scoring ability was determined by the level of specificity required to be correct and was rated as completely correct, partially correct, or completely incorrect. Feedback quality was determined by the degree of thoroughness in addressing identified issues and explaining errors. Feedback quality was rated as poor, moderate, or good. Each author was also tasked with timing themselves from start to finish to track process efficiency and noting any other positive or negative aspects of the exercise.

What we found

Table 1. Results of testing for each condition

	Time to create (minutes)	Scoring Ability	Feedback Quality
Class Notes	6:20	Completely correct: 10	7 good; 3 moderate
ACLS Guideline	5:47	Completely correct: 9 Completely incorrect:1	8 good; 2 moderate
General Course Topic	3:41	Completely correct: 10	9 good; 1 moderate

ChatGPT demonstrated proficiency in quickly generating questions and engaging in conversation with each author. Our observations revealed that initial feedback tended to be vague; however, through targeted prompting, like requesting “thoughtful feedback” on a particular missed question, AI produced more valuable insight with background information compared to the initial response providing only the correct answer. Also, under condition three, some questions extended beyond the scope of what is generally taught in the course.

We did find some setbacks during the testing phase. Specifically, ChatGPT inaccurately graded two of the three quizzes. In the first trial, it accurately recognized incorrect questions, while in the second trial it mistakenly marked a correctly answered question as incorrect. Despite these discrepancies, both quizzes received a grade of 80%, when the true score of both quizzes should have been 70%. The generator also faced challenges with more detailed guidelines, but performed better when it was tested with a more general guideline such as ACLS.

What this means

Our pilot project to test the feasibility of using ChatGPT for retrieval-based practice was not perfect, but it holds promise for augmenting students’ study strategies. Our experiences revealed the free version of ChatGPT has potential for helping students study, but there are limitations. It is subject to errors and may not be applicable for all types of courses or content. We subsequently tested the upgraded, paid version of ChatGPT-4 and discovered that it can more accurately process dense, longer guidelines and create more detailed and nuanced questions. Teaching students how to create detailed, effective prompts to converse with ChatGPT may also improve the questions, grading accuracy, and feedback quality. Further experimentation of applicability is needed with different topics and with different platforms.

As we continue to explore the possibilities of AI in education, we should empower students to use these tools for evidence-based learning. Do you have other examples of how to use AI for retrieval-based practice?

References:

Roediger HL 3rd, Butler AC. The critical role of retrieval practice in long-term retention. Trends Cogn Sci. 2011;15(1):20-27. doi:10.1016/j.tics.2010.09.003
Karpicke JD. Retrieval-Based learning: A decade of progress. Learning and Memory: A Comprehensive Reference. Published online 2017:487-514. doi: 10.1016/b978-0-12-809324-5.21055-9
Karpicke JD, Roediger HL 3rd. Repeated retrieval during learning is the key to long-term retention. J Mem Lang. 2007;57(2):151-162. doi: 10.1016/j.jml.2006.09.004
Dunlosky J. Strengthening the student toolbox: Study strategies to boost learning. Am Educ. 2013;37(3):12-21. https://eric.ed.gov/?id=EJ1021069

‌

Author Bio(s):

Morgan Lewis is a fourth-year pharmacy student at the University of Kentucky College of Pharmacy. Ultimately, she hopes to pursue a career in psychiatric pharmacy and academia post-graduation. In her free time, Morgan enjoys spending time with her friends and family, taking care of her two cats, and playing in her pickleball league.

Sara Stallworth is a Postdoctoral Academic Fellow at the University of Kentucky College of Pharmacy. Her educational scholarship interests include utilization of generative AI platforms to augment pharmacy education, pharmacy student stress and wellbeing, and enhancing student engagement in pharmacy education. In her free time, Sara enjoys spending quality time with friends and family, traveling to new places, and trying new recipes.

Jeff Cain, EdD, MS is an associate professor and vice-chair in the Department of Pharmacy Practice & Science at the University of Kentucky College of Pharmacy. Jeff’s educational scholarship interests include innovative teaching, digital media, and contemporary issues in higher education. In his free time, he is dad to a pole-vaulting daughter, an extreme trail ultramarathoner, and is president of For Those Who Would, a 501(c)(3) charity in the adventure and endurance racing communities.

Pulses is a scholarly blog supported by a team of pharmacy education scholars.

Share this:

Leave a comment Cancel reply