Rubrics in AI Evaluation: Enhancing Reliability, Validity, and Fairness Across Domains
Abstract This research report provides a comprehensive overview of rubrics as evaluation tools in the context of Artificial Intelligence (AI). Moving beyond the specific example of physician-created rubrics for HealthBench, we explore the broader applications, […]
