After you have designed a test or assessment for the students, you have to remember that:
"Every test has its' limitations."
Assessment
procedures range from highly developed measuring instruments (e.g.,
standardized aptitude and achievement tests) to rather crude assessment devices
(e.g., observational and self-report techniques). You must be aware that even the best educational and
psychological measuring instruments will have various types
of measurement error.
There are three types of limitations:
1. Sampling Error
It is not possible to assess our students' knowledge thoroughly because it is difficult for teachers to test everything in a single examination. Instead, only a sample of the relevant
problems or questions will be presented in the test. This will result in sampling error because teachers are unable to assess the students' real ability through the "incomplete" test.
Since any test is only a sample of all possible
items, the item sample itself can be a source of error. Longer
tests which consist of many types of test items are typically more reliable because we get a better sample
of the course content and students’ performance.
- Suppose a teacher who wants to measure students' achievement of an unit in biology, gave only one essay question in the test. Students who knew this one question would have
perfect achievement, but students who didn’t would fail.
Obviously, a one-question test would not provide a reliable
estimate of the students’ knowledge. But as more and more
questions were added, one would obtain a sample that better fits
the unit of instruction and yields scores that more accurately
reflect real
differences in achievement. So by increasing the length of the
test (the size of the sample) we increase the consistency of our
measurement.
Consequently, sampling error is one of the most common problems in educational and
psychological measurement. Even a good sample of achievement test may not adequately include a
particular area of instructional content. Apart from that, an observational instrument designed to
assess a student’s social adjustment may not sample enough behaviour for a
dependable index of this trait.
Fortunately, sampling error is one kind of error that can be controlled through careful application of established measurement procedures. Examples of measurement procedures are maximum performance and typical performance, fixed-choice test and complex-performance assessments, and etc.
2. Chance Factors
A second
source of error is caused by chance factors
such as guessing on objective tests, subjective scoring on essay tests, errors
in judgement on observation devices, and inconsistent responding on self-report
instruments (e.g. attitude scales).
A longer test also tends to reduce the influence of chance
factors such as guessing. If a teacher gave a ten-item
multiple choice test, a student might know six of the items and
guess at the other four. If the student happened to guess correctly,
he/she would show perfect achievement. If the student happened to
guess incorrectly, he/she would show only 60 percent achievement. If
that test, however, had 100 items, the student’s correct guesses
would be balance by incorrect guesses, and the score would be a
more reliable indication of real knowledge. By increasing the number of questions, the chance for students to guess the correct answers will be reduced.
- Warning: Lengthening a test improves
reliability only when the additional items are good quality and
as reliable as the original ones. Adding poor quality items will
actually induce error and lower reliability. Furthermore, there
is a point of diminishing returns — if we add too many items, we
risk student fatigue which will result in lower reliability.
[Source:http://www.indiana.edu/~best/bweb3/test-reliability/ ]
Nevertheless, through the
careful use of assessment procedures, we are able to keep these errors of
measurement to a minimum.
3. Incorrect Interpretation
"Different examiners will mark the papers differently."
All examiners mark papers differently so it is possible that some may be too lenient while some are too strict. Some examiners even interpreted the results too precisely as they strictly followed the marking scheme. Moreover, there are examiners who goes
beyond the assessment criteria in which they assessed the students more than what the test is supposed to measure.
Therefore, it is very important for teachers to know what is the objective of the assessment and grade the students according to a standard (a rubric) that assesses precisely that criteria. If the purpose of the examination is to test the students' ability to use sequence connectors in an essay, teachers should focus more on that criterion rather than on grammatical mistakes or spelling mistakes.
- In order to solve this problem, applicants will be sent for several trainings and only the qualified candidates will be chosen as the examiners.
We must bear in mind that misinterpretation of test results is
all too common and is one of the major considerations concerning the validity
of an assessment. Avoiding misinterpretation
requires careful attention to what the test actually measures, how accurately
it does so, and its intended uses.
These
limitations of assessment procedures do not negate the value of tests and other
types of assessments. A keen awareness of the limitations of assessment
instruments makes it possible to use them more effectively.
*Keep in mind that
the cruder the instrument, the greater its limitation.*
|
Quote of The Day |