Making Benchmark Testing Work
The article discusses six criteria that can be used to evaluate the validity of benchmark tests: alignment, diagnostic value, fairness, technical quality, utility, and feasibility (see also Linn, Baker, & Dunbar, 1991). Based on these criteria, the authors summarize their perspective in six recommendations for educators:
Align standards and benchmark assessments from the beginning of test development. Focus on the big ideas of a content area and counteract curriculum narrowing by designing benchmark tests that allow students to apply their knowledge and skills in a variety of contexts and formats.
Enhance the diagnostic value of assessment results through initial item and test structure design. Use extended-response items to reveal student thinking and potential misconceptions. Build distracters into multiple-choice items that reveal common student misunderstandings.
Ensure the fairness of benchmark assessments for all students, including English language learners and students with disabilities. Avoid unnecessarily complex language or specific contexts that could unfairly confound some students’ ability to show what they know.
Insist on data showing tests’ technical quality. Study psychometric indices to determine the reliability and precision of the assessments.
Build in utility. Design reports of test results to be user-friendly and to provide guidance on how to appropriately interpret and use the results.
Hold benchmark testing accountable for meeting its purposes. Crafting good benchmark tests and ensuring their wise use for improving student learning requires systematic design and continual evaluation.
This article presents a comprehensible review of important technical-quality aspects of benchmark assessments. The article provides educators with useful recommendations about the design and utility of benchmark assessments and guidance for purchasing benchmark assessments.