Advantages of Standardized Testing:
- Can provide assessments that are psychometrically valid and reliable, as well as results which are generalized and replicable.
- A well designed standardized test gives an assessment of an individual’s mastery of a domain of knowledge or skill which at some level of aggregation will provide useful information.
That is, while individual assessments may not be accurate enough for practical purposes, the mean scores of classes, schools, branches of a company, or other groups may well provide helpful information because of the reduction of error accomplished by increasing the sample size.
While standardized tests are frequently criticized as unfair, the psychometric standards applied in the development of standardized tests would produce fairer testing if applied in other types of testing. In particular, the effectiveness of each test item in accomplishing the goal of the test would have to be demonstrated.
Design and Scoring
In practice, standardized tests can be comprised of multiple-choice and true-false questions. Such items can be tested inexpensively and quickly by scoring special answer sheets by computer or via computer-adaptive testing.
Some tests also have short-answer or essay writing components that are assigned a score by independent evaluators. These can be graded by evaluators who use rubrics (rules or guidelines) and anchor papers (examples of papers for each possible score) to determine the grade to be given to a response.
A number of assessments, however, are not scored by people. For example, the Graduate Record Exam is a computer-adaptive assessment that does not require scoring by people (except for the writing portion).
Scoring Issues
There can be issues with human scoring. For example, the Seattle Times reported that for Washington State’s WASL, temporary employees were paid $10 an hour. They spent as little as 20 seconds on each math problem, 2 and 1/2 minutes on an essay on items which may determine if a student graduates from high school, which some believe is a matter of concern given the high stakes nature of such tests. Pearson scores many other state tests similarly. Agreement between scorers can differ between 60 to 85 percent depending on the test and the scoring session. Sometimes states pay to have two or more scorers read each paper to improve reliability, though this does not remove test responses getting different scores.
Standards
- Evaluation Standards
In the field of evaluation, and in specific educational evaluation, the Joint Committee on Standards for Educational Evaluation has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.
Each publication shows and expounds a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help make sure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
- Testing Standards
In the field of psychometrics, the Standards for Educational and Psychological Testing place standards about validity and reliability, along with errors of measurement and related considerations under the general topic of test construction, evaluation and documentation. The second main topic covers standards related to fairness in testing, including fairness in testing and test use, the rights and responsibilities of test takers, testing individuals of diverse linguistic backgrounds, and testing individuals with disabilities. The third and final main topic cover standards related to testing applications, including the responsibilities of test users, psychological testing and assessment, educational testing and assessment, testing in employment and credentialing, plus testing in program evaluation and public policy.