The terms normative sample and standardization sample refer to the same concept and are often used interchangeably, though standardization sample is the term more frequently used in statistics and normative sample is more common within psychometrics. A norm referenced test uses a normative or standardization sample from the general population to determine what is “typical” or “normal” in that population. 

Test designers choose a population they feel represents the target population to be evaluated using the test. Most commercially available tests use a sample that reflects the most recent U.S. census information. For example, the PLS-4 used a sample of 1,534 children in their norming sample and based the ratio of children included from various regions, socioeconomic backgrounds, disability statuses, and races on the 2000 census.

Normative Samples and Linguistically and Culturally Diverse Children

Children from the same background as the child being evaluated may be included in the normative sample; however, they are almost always not present in numbers sufficient enough to make the standard scores representative of a typical child from the client’s background. For example, perhaps you are evaluating an African American child from low-socioeconomic status background in an urban area. Although 15% of children in the normative sample may be African American, how many of those children are from low socioeconomic backgrounds and urban areas? In the end, a very small percentage of the normative sample is representative of your child’s background and the performance that has been labeled as “typical” by the test designers overwhelmingly reflects performance of children from vastly different backgrounds than that of the child being evaluated. As a result, none of these tests is a valid instrument for assessing culturally or linguistically diverse (CLD) children.

Normative Samples and “Spectrum Bias”

Research has shown that most standardized language tests do not have sufficient accuracy to identify a language disorder in the general population based on test performance alone (Vance & Plante,1994). Some test designers manipulate the populations used in the standardization study to establish acceptable levels of test sensitivity and specificity. For example, they may choose to use high performing children from areas with a well-educated and high socioeconomic status background. Research has demonstrated a high correlation between socioeconomic status and performance on standardized tests for the typically developing  group (Horton-Ikard & Weismer, 2007Pruitt & Oetting, 2011). On the other hand, they may choose children with more severe disabilities for the language impaired group. In this situation, the typically developing children will do better than expected due to their backgrounds while those children with more serious disabilities will do worse than children with a mild impairment, therefore ensuring high accuracy in diagnosis.


Test Review: PLS-5

Validity Part 1: Construct Validity

Validity Part 2: Validity, SES, and the WISC-IV Spanish

Validity Part 3: ELLs, IQs, and Cognitive Tests

NYCDOE Initial Guidance Document for Speech and Language Evaluators


Horton-Ikard, R. & Weismer, S.E. (2007). A preliminary examination of vocabulary and word learning in african american toddlers from middle and low socioeconomic status homes. American Journal of Speech-Language Pathology, 16, 381-392.

Pruitt, S., Oetting, J. & Hegarty, M. (2011).
 Passive participle marking by african american english-speaking children reared in poverty. Journal of Speech, Language, and Hearing Research, 54, 598-607.

Vance, R. & Plante, E. (1994). Selection of preschool language tests: A data based approach. Language, Speech and Hearing Services in Schools, 25, 15-24.