William & Mary · CSCI 455/555 · Spring 2026
module 03 / 12 · ~50 min
module 03 · evaluating.lecture

Evaluating Rigorously

Master the metrics for evaluating AI systems that write and understand code—from token overlap to semantic alignment and functional correctness testing.

format split-pane composer
scratch persistent JS repl
keys esc · ←→ · ⌘↩