Wednesday, January 20, 2010

Summative assessment in disarray

Summative assessment is being stretched to cover learning goals that resist robust, reliable and affordable summation; and that these assessment practices are shot through with contradictions such that the data they provide signify less than people tend to assume.

higher education institutions are generally expected to have learning goals that are far more extensive and complex than mastery of subject matter alone, and that they are being held to account for student achievement in terms of those goals.
public sector services are nowadays marked by low-trust management systems, when once there would have been a greater readiness to trust that good people engaged on worthwhile activities would learn the sorts of things that were intended. Assessment is supposed to supply evidence to bridge the trust gap.
quest for reliability tends to skew assessment towards the assessment of simple and unambiguous achievements, and considerations of cost add to the skew away from judgements of complex learning. To put it another way, high stakes assessments have trouble with the complex ambitions of higher education curricula and may actually impede them (Boud, 1995).
Assessment involves making assumptions about what exists, what it is like and how we might know about it. For example, if skills are nothing more than convenient terms for social practices that are decidedly situation-speci. c, and hence changeable (Holmes, 2001), then it will be frustrating to try to assess skills as if they were real, generalisable achievements.
there are objections to the claim that we have conscious access to all that we know, which raises obvious problems with trying to assess it. There is a strong line of thought which holds that much of what we know is tacit and distributed, and that much learning is non-formal.
The Limits of Reliability: fictional objects of assessment cannot be assessed with validity, and where validity is lacking, reliability is compromised.
The Stability of Assessment Judgements: Repeated observations are necessary before claiming that the observed behaviour is likely to be stable. This is important because if a higher education institution wishes to warrant achievement, then the warrant should be based on several assessors judging different instances of it. That
can hardly be done in a single module, but it might be done if there was an assessment plan covering a complete degree programme.
The Transferability of Achievement: The achievements that grades or degree classes signify may not be very transferable. Many psychologists hold transfer to be an achievement in its own right, not something that flows
freely and easily, except in familiar settings where speci. c transfer heuristics have been routinised (Anderson et al., 2000). Nor do scores and grades say anything about the learner’s ability to perform independently or in novel contexts. They may indicate a performance achieved with the help of plenty of scaffolding or with none. Even where two products are fairly awarded the same grade, there is a real difference between an achievement
where the task has been well defined, procedures for success have been laid down and plentiful guidance has been available, and another where no scaffolding has been provided. In other words, it is proper to doubt any assumption that warrants achievements that the learner can readily and independently transfer to fresh settings.
Limitations to Criteria-referencing: Although criteria-referenced assessment has many strengths, particularly compared to normreferencing, it is important to insist that benchmarks, speci. cations, criteria and learning outcomes do not and cannot make summative assessment reliable, may limit its validity and certainly compound its costs. For example, educational criteria are necessarily imprecise unless they refer to highly determined, even trivial achievements. Trainers may be able to develop and use precise-looking criteria, but educators work with fuzzy learning outcomes. Even ‘precise’ criteria are fuzzy to the extent: (i) that their meanings emerge in local communities of practice; and (ii) in the context of specific tasks (Wolf, 1997). Although criteria-referenced grading
may be good for student learning and equity in a community of practice, differences in the criteria used prevent, even impede, communication between communities; and make it impossible to be sure what any warrant means, since it is not possible to know what criteria have been used, what meanings have attached to them and how they have been used.
Assessment and Curriculum Skew: High stakes assessments, of the sorts that appear on transcripts and that lead to awards, have to be robust enough to stand up to legal challenge, so they tend to rest on assessments of things that people (often wrongly) believe can be judged reliably. This distorts the curriculum in two ways. First, what is subject to high stakes assessment gets serious attention and the rest does not. Secondly, achievements that are not warranted by high stakes assessment are neither recorded nor celebrated. The enacted curriculum becomes what high stakes judgements cover. Non-authentic assessments produce non-authentic curriculum, regardless of what the validated curriculum claims. There is a real danger that the frustrations of trying to assess
such accomplishments in reliable ways will lead to the use of national, content-free tests. Not only is their predictive validity in doubt (Sternberg, 1997), but if students concentrate on becoming test-smart, the tests’ consequential validity decreases because they actually distract students from the curriculum designed to teach those things that the tests claim to measure by proxy.

To compound matters, it should be understood that summative assessment may not be able to deliver what it is widely supposed to.

Copied from
Knight, P. (2002). Summative assessment in higher education: practices in disarray. Studies in Higher Education, 27.3, pp. p275-86.

No comments:

Post a Comment