According to Oosterhof, Conrad, & Ely (2008) constructed-response assessments include short and long essay tests and fill in the blank tests (or completion items as they are otherwise called). They list and discuss the advantages and disadvantages to using these types of assessments, which stem from the test itself to the making and grading of the test. I will discuss completion items first, and then I will cover essay tests.
The advantages of completion tests are three: the test is easy to make, students must give instead of choose an answer, and the number of questions in this type of test can be many. The test is relatively easy to make due to the fact that they usually measure recall of information instead of procedural knowledge. Another reason they are easy to make is because they do not require scoring plans such as those that essay tests require. Since students must give an answer instead of picking an answer (such as they would on a multiple choice test), scores are not negatively affected by guessing as is the case with multiple choice tests. Therefore, completion tests are generally more reliable than selected-response tests. The number of items on these types of tests can be many, which means a better, more complete sampling of the content can be achieved.
Limitations of completion tests are two: they generally measure the recall of information and have a higher scoring error probability than do other formats. Completion-tests measure knowledge of facts, and thus do not generally require higher level thinking skills, which is a goal of a good educational program. Since answers to these types of questions can often have several correct responses, the probability that they will be mechanically scored incorrectly is very present. For example, if a student answers a question in the plural form of the answer instead of the singular, it could be counted wrong (even though it is correct) if the instructor/designer didn’t include that plural response as acceptable when the test was designed and put online.
When designing these types of tests, take several steps to ensure reliability and validity. Always ensure that the items measure the behavior required in the instructional objective, ensure that the reading level of the test is below the reading level of the students in order to prevent a student’s reading proficiency to affect his/her test results (unless the test is an actual reading test), and that the items are written in very direct language. Also, be sure that the blanks represent key words from the learning. Otherwise, the test will measure reading comprehension more than achievement of learning objectives. As was mentioned already, ensure that only a single or set of very homogenous set of responses represent a correct answer. Do not use sentences from the actual readings of the class. This can be problematic because it encourages students to memorize instead of reading and comprehending. Sentences taken out of a paragraph lose their contextual references/clues and therefore can be misinterpreted. Place blanks at the end of the question instead of the beginning or middle of the item to make it easier and faster for students to read the item and supply an answer. Use only one blank per item in order to help the efficiency with which a student can understand what the question is asking and also to eliminate a larger set of correct responses. Finally, if you are using a question that requires a numerical answer, be sure to include the units expected in the response in order to prevent a student from giving an answer in a different unit than what is required and therefore getting a wrong answer (when it actually is correct, but just listed in a different unit). For example, if an answer is 36 inches, make sure you specify the answer must be given in inches; otherwise, the student might answer 3 as in 3 feet.
Essay tests have three advantages and three disadvantages. Advantages are: they measure more directly the behaviors specified by the instructional objectives, they examine the learner’s ability to communicate ideas in writing, and they help instructors gain insight into the thinking that leads to students’ answers which can reveal a student’s logic. Since the instructional objectives can many times be rewritten as an essay test question, essay items can measure the behavior more directly. Therefore, it is important to take care when writing performance objectives. Even though essay tests can help instructors measure a student’s ability to express their thoughts in writing, the goal of the test should be to measure how well a student met the instructional objectives. Therefore, two scores should be used if the instructor wants to evaluate the student’s writing ability: one to measure the proficiency with which the student met the objective(s) and one to measure the writing ability. Another important note here is to be sure not to use the essay test as a means to teach students to write. It is not an effective teaching method because it is a testing situation, not a learning situation. Finally, the last advantage involves ensuring that the student is not using the wrong logic to reach an answer. For example, with a multiple choice test, even though a student chooses a correct answer, it isn’t possible to see why they chose the answer. With an essay question, the student will have to answer correctly and explain his reasoning.
Disadvantages include a smaller sampling of the content than other formats, scoring can be very subjective, and they take more time to score than other formats. The reason essay tests can’t sample as much of the content as other formats is due to the time it takes for a student to respond to the question. Because of this, teachers must take time to write well-developed, strong essay test items. They should also create scoring plans for each essay question that defines a correct answer and how many points per each critical item within the essay will be given. Which leads to another disadvantage of this format: the scoring can be very subjective and even include bias. Finally, the third disadvantage is the amount of time to score the essay test is longer than other formats.
When designing essay tests, teachers should try to follow certain guidelines. The response required by the essay item may be brief or extended. Extended items may not be appropriate for online settings if the question is asking learners to demonstrate more than one skill. Instead, it would be better to break the long question down into shorter questions. This will allow the scoring to be more consistent and will allow broader skills to be assessed. Of course, this is only possible if the questions ask students to demonstrate declarative or procedural knowledge. If the question is asking online students to solve a problem, another test format other than an essay test should be used. Therefore, avoid asking online students to problem solve in an essay item.
In order to develop high quality essay items, teachers should follow six criteria. First, always ensure that the item measures the specific skill/instructional objective. One way to accomplish this is to be sure to NOT allow students to choose the items they will answer. If you do, be sure that all choices assess the same capability. Second, the reading level should be below that of the learner in order to ensure we are measuring the skill and not a student’s reading level. Third, the question should take only ten minutes to answer. Otherwise, it is an extended response item, which should be avoided in online settings. Fourth, a good scoring plan must be devised to ensure the validity of the test. Fifth, the scoring plan should describe a correct and complete response so scorers will be able to identify correct responses more accurately. Last, the item should be written in such a way that the knowledgeable learner will be able to determine the characteristics of a correct answer.
When grading online essay tests, there are certain things that can be done to ensure more consistent scoring. Teachers should be able to read all of the students’ answers given to a certain item before reading the responses to the next item. This helps instructors complete the scoring more quickly and to maintain an clear idea of expectations for that item instead of going through one paper at a time and trying to recall all the expectations for all of the items. Teachers should always read the responses of students in a random order and the papers should be reordered after reading/scoring each item. Research has shown that the quality of a previous paper can affect the scoring of the next paper. Another precaution to ensure more consistent scoring would be to conceal the identity of the student while grading so that no bias exists. Using multiple readers is another way. Finally, items that cause students to answer in various ways to the same question should be revised before being used again as they make it difficult to substantiate whether students have met the instructional objective.
References
Oosterhof, A., Conrad, R., Ely, D. (2008). Assessing Learners Online. Upper Saddle River, NJ: Pearson Education, Inc.