Dummies in Assessments: Remember the limitations!

After you have designed a test or assessment for the students, you have to remember that:

"Every test has its' limitations."

Assessment procedures range from highly developed measuring instruments (e.g., standardized aptitude and achievement tests) to rather crude assessment devices (e.g., observational and self-report techniques). You must be aware that even the best educational and psychological measuring instruments will have various types of measurement error.

There are three types of limitations:

1. Sampling Error

It is not possible to assess our students' knowledge thoroughly because it is difficult for teachers to test everything in a single examination. Instead, only a sample of the relevant problems or questions will be presented in the test. This will result in sampling error because teachers are unable to assess the students' real ability through the "incomplete" test.

Since any test is only a sample of all possible items, the item sample itself can be a source of error. Longer tests which consist of many types of test items are typically more reliable because we get a better sample of the course content and students’ performance.

Suppose a teacher who wants to measure students' achievement of an unit in biology, gave only one essay question in the test. Students who knew this one question would have perfect achievement, but students who didn’t would fail. Obviously, a one-question test would not provide a reliable estimate of the students’ knowledge. But as more and more questions were added, one would obtain a sample that better fits the unit of instruction and yields scores that more accurately reflect real differences in achievement. So by increasing the length of the test (the size of the sample) we increase the consistency of our measurement.

Consequently, sampling error is one of the most common problems in educational and psychological measurement. Even a good sample of achievement test may not adequately include a particular area of instructional content. Apart from that, an observational instrument designed to assess a student’s social adjustment may not sample enough behaviour for a dependable index of this trait.

Fortunately, sampling error is one kind of error that can be controlled through careful application of established measurement procedures. Examples of measurement procedures are maximum performance and typical performance, fixed-choice test and complex-performance assessments, and etc.

2. Chance Factors

A second source of error is caused by chance factors such as guessing on objective tests, subjective scoring on essay tests, errors in judgement on observation devices, and inconsistent responding on self-report instruments (e.g. attitude scales).

A longer test also tends to reduce the influence of chance factors such as guessing. If a teacher gave a ten-item multiple choice test, a student might know six of the items and guess at the other four. If the student happened to guess correctly, he/she would show perfect achievement. If the student happened to guess incorrectly, he/she would show only 60 percent achievement. If that test, however, had 100 items, the student’s correct guesses would be balance by incorrect guesses, and the score would be a more reliable indication of real knowledge. By increasing the number of questions, the chance for students to guess the correct answers will be reduced.

Warning: Lengthening a test improves reliability only when the additional items are good quality and as reliable as the original ones. Adding poor quality items will actually induce error and lower reliability. Furthermore, there is a point of diminishing returns — if we add too many items, we risk student fatigue which will result in lower reliability.

[Source:http://www.indiana.edu/~best/bweb3/test-reliability/ ]

Nevertheless, through the careful use of assessment procedures, we are able to keep these errors of measurement to a minimum.

3. Incorrect Interpretation

"Different examiners will mark the papers differently."

All examiners mark papers differently so it is possible that some may be too lenient while some are too strict. Some examiners even interpreted the results too precisely as they strictly followed the marking scheme. Moreover, there are examiners who goes beyond the assessment criteria in which they assessed the students more than what the test is supposed to measure.

Therefore, it is very important for teachers to know what is the objective of the assessment and grade the students according to a standard (a rubric) that assesses precisely that criteria. If the purpose of the examination is to test the students' ability to use sequence connectors in an essay, teachers should focus more on that criterion rather than on grammatical mistakes or spelling mistakes.

In order to solve this problem, applicants will be sent for several trainings and only the qualified candidates will be chosen as the examiners.

We must bear in mind that misinterpretation of test results is all too common and is one of the major considerations concerning the validity of an assessment. Avoiding misinterpretation requires careful attention to what the test actually measures, how accurately it does so, and its intended uses.

These limitations of assessment procedures do not negate the value of tests and other types of assessments. A keen awareness of the limitations of assessment instruments makes it possible to use them more effectively.

*Keep in mind that the cruder the instrument, the greater its limitation.*

Quote of The Day

Dummies in Assessments

Tuesday, December 17, 2013

Remember the limitations!

No comments:

Post a Comment