Posted By Doug Peterson
A lot more goes into planning a test than just writing a few questions. Reliability and validity should be established right from the start. An assessment’s results are considered reliable if they are dependable, repeatable, and consistent. The assessment is deemed to be valid if it measures the specific knowledge and skills that it is meant to measure. Take a look at the following graphic from the Questionmark white paper, Assessments through the Learning Process.

An assessment can be consistent, meaning that a participant will receive the same basic score over multiple deliveries of the assessment, and that participants with similar knowledge levels will receive similar scores, yet not be valid if it doesn’t measure what it’s supposed to measure (Figure 1). This assessment contains well-written questions, but the questions don’t actually measure the desired knowledge, skill or attitude. An example would be a geometry exam that contains questions about European history. They could be absolutely excellent questions, very well-written with a perfect level of difficulty … but they don’t measure the participant’s knowledge of geometry.
If an assessment is not reliable, it can’t be valid (Figure 2). If five participants with similar levels of knowledge receive five very different scores, the questions are poorly written and probably confusing or misleading. In this situation, there’s no way the assessment can be considered to be measuring what it’s supposed to be measuring.
Figure 3 represents the goal of assessment writing – an assessment made up of well-written questions that deliver consistent scores AND accurately measure the knowledge they are meant to measure. In this situation, our geometry exam would contain well-written questions about geometry, and a participant who passes with flying colors would, indeed, possess a high level of knowledge about geometry.
For an assessment to be valid, the assessment designer needs to know not just the specific purpose of the assessment (e.g., geometry knowledge), they must understand the target population of participants as well.  Understanding the target population will help the designer ensure that the assessment is assessing what is supposed to be assessed and not extraneous information. Some things to take into account:

  • Job qualifications
  • Local laws/regulations
  • Company policies
  • Language localization
  • Reading level
  • Geographic dispersion
  •  Comfort with technology

For example, let’s say you’re developing an assessment that will be used in several different countries. You don’t want to include American slang in a test being delivered in France; at that point you’re not measuring subject matter knowledge, you’re measuring knowledge of American slang. Another example would be if you were developing an assessment to be taken by employees whose positions only require minimal reading ability. Using “fancy words” and complicated sentence structure would not be appropriate; the test should be written at the level of the participants to ensure that their knowledge of the subject matter is being tested, and not their reading comprehension skills.
In my next installment, we’ll take a look at identifying content areas to be tested.