 
Prompting LLMs to Notice Student Thinking in Small Group Discussions
Building on student thinking is central to effective instruction in K–12 mathematics education, yet the foundational practice of noticing student thinking remains a persistent challenge. As a high-leverage practice that teachers can learn and are expected to enact across diverse classrooms and contents, noticing student thinking requires sustained opportunities for professional development and ongoing support for consistent implementation. However, the need for expert human judgment makes the work difficult to scale. This study explores whether large language models (LLMs) can support this practice by comparing GPT-4o’s annotations of student thinking in small-group math discussions to those of expert educators. We operationalize noticing student thinking as identifying pre-specified mathematical moves in transcripts and assess alignment using Krippendorff’s unitized alpha, a span-level agreement metric. Human–model agreement approaches human–human agreement, and in some cases, the model identifies meaningful discourse moments that individual raters miss. While discrepancies remain, qualitative analysis suggests LLMs could offer useful reference points for teachers. Rather than acting as expert noticers, LLMs may scaffold teacher reflection and support scalable, content-specific professional learning.Prompting LLMs to Notice Student Thinking in Small Group Discussions
EDS Students
