Where FairTest Gets It Wrong
October 10, 2016
In a recent report by the National Center for Fair & Open Testing (FairTest), “Assessment Matters: Constructing Model State Systems to Replace Testing Overkill,” the authors deem performance assessments as the preferred model for state assessment systems and detail their Principles for Assessment.
The issue of high-quality assessments is of critical importance today and the use of assessments to inform and enhance student learning is certainly one of the primary uses; however, I disagree with many of their conclusions.
Performance assessments often provide students with an opportunity to engage in extended and complex problems and situations which can be more authentic than a typical objective test question. ACT has highlighted in our K–12 Policy Platform, assessment formats should vary according to the type of standards that need to be measured and the intended construct to be measured; typically, a balance of question types provide the basis for a comprehensive evaluation of student achievement.
In advocating for performance assessments, FairTest incorrectly claims that multiple-choice assessments are limited “to facts and procedures and thereby block avenues for deeper learning.” As ACT research shows in “Reviewing Your Options: The Case for Using Multiple-Choice Test Items,” multiple-choice items can test higher-order thinking skills—by requiring students to, for example, apply information they have learned from a given scenario to a new situation, or recognize a pattern and use it to solve a problem—and do so in an efficient and cost-effective manner.) Instead of being dogmatic to a particular assessment format, states and schools need to focus on what is being measured and try to balance innovation and sustainability.
The report also ignores some of the limitations of performance tasks:
they require significantly more time to complete, which reduces instructional time;
they sample relatively few skills, which means scores are based on only a very small subset of standards or content;
they are often highly expensive to create and score, which delays score reporting; and
they have lower reliability (and score precision) than multiple choice tests.
Related to FairTest’s Principles for Assessment, I disagree that assessments systems should be decentralized and primarily practitioner developed and controlled. To create a fair, valid, and reliable assessment is difficult and time-consuming work. Before a question is placed on the ACT and scored, a number of very extensive and detailed processes needs to occur, including multiple reviews by internal and external experts to ensure the item is measuring what it says it is measuring and not introducing irrelevant information that may make it more difficult for students to access.
For example, at ACT we try to reduce the language load on math items to ensure that they measure math and not a student’s reading ability. Other testing programs may include extensive reading passages and context in presenting a math item, but we need to ask ourselves: Does the heavy reading load disadvantage a student with limited English experience who otherwise is highly proficient in mathematics? The reviews also ensure that all test questions are culturally sensitive and that test forms as a whole include a balance in terms of culture, gender, and life experience.
Further, tests forms are created to match particular content and statistical specifications. This helps to ensure that the assessments are comparable across time. Doing so is necessary to better maintain longitudinal trends used to monitor achievement gaps or measure growth within a classroom, across districts, and/or across schools within a state.
Finally, FairTest includes among its principles that students should exercise significant control where appropriate, for example by deciding whether to include SAT or ACT scores in their college applications. As highlighted in recent ACT research, “More Information, More Informed Decisions,” more sources of student information—not fewer—are needed to better understand a student’s preparedness for college.
In ignoring the realities of cost—both in teacher time and financial–that states face in developing their assessment systems and the need for fairness, reliability, and validity in the construction and administration of tests, FairTest inflates some good ideas for innovative item formats into a “system” that many if not the majority of states will find difficult to construct or unworkable at scale.
ACT advocates for a holistic view of student learning using multiple sources of valid and reliable information. Performance assessments and teacher-created assessments can be one source of information, but for most states, relying on them exclusively is not feasible due to technical capacity and costs.