Reliability is the degree of consistency of measurement in a test. A test has a high degree of reliability if it produces similar results consistently under similar conditions. 

There are several types of reliability. Inter-examiner reliability and test-retest reliability are especially relevant to language tests. Inter-examiner reliability indicates the consistency of a test when given by different evaluators. A test has good inter-examiner reliability if the scores are the same regardless of who administers the test. Test-retest reliability measures the consistency over time when given multiple times to the same child.  If the test has good test-retest reliability, then the scores will be the same, even after the child has taken it multiple times.

Several factors contribute to reliability such as accuracy in measurement of the test items. For example, if the test items are truly measuring the quality they are meant to measure, this supports high reliability. Other characteristics detract from a test’s reliability including temporary testing conditions (e.g., distractions, discomfort) or temporary conditions in the individual being tested (e.g., illness, lack of motivation, fatigue).

It is important to note that high reliability does not mean the test has high validity. A test may consistently produce the same standard score, but if the test is invalid it is only consistently producing an inaccurate result. For example, consider a miscalibrated scale. This scale may show that five pounds of fruit weighs seven pounds, no matter how many times you weigh the fruit. In this example, the scale is highly reliable but it is not a valid measure of weight.



NYCDOE Initial Guidance Document for Speech and Language Evaluators