A normal distribution, also called a bell curve, occurs when variables (i.e., test scores) plotted on a graph fall into a regular distribution around a single mean. In a normal distribution, about 96% of the scores will fall within 2 standard deviations of the mean.

For speech language pathologists, this is most relevant in understanding norm-referenced tests. When standardizing the test, test designers administer a version of the test to a large number of children that make up a standardization or normative sample. If the test has been designed well, the scores should fall into a normal distribution with a majority of the scores concentrated around the mean. The standard score and standard deviations (SD) are derived from the performance of the standardization sample.

For example, a certain test may have 200 items, worth 1 point each. The test designers give the test to, preferably, a large sample including children of all ages that the test is designed for as well as from various backgrounds. If 100 five-year-olds take the test and the average score (mean) and most common score (mode) is 100, the sample has produced a bell curve (normal distribution) with 100 at the 50th percentile. This would be right at the top of the bell curve. The closer a score falls to the middle of the distribution of scores, the more “typical” it is considered to be. Scores that fall very high above the “typical scores” are considered exceptional. Scores that fall far below “typical scores,” usually more than 1.5 to 2.0 standard deviations below the mean, are considered to be indicative of a disorder.

Importantly, no commercially available test is considered acceptably accurate to be able to identify a disorder based on the score alone. In fact, research has also demonstrated that these tests do not consistently and appropriately classify children as typically developing or language impaired (Dollaghan & Horner, 2011Spaulding, Plante & Farinella, 2006).

Additionally, commercially available norm-referenced tests are based on a standardization sample that is not representative of any bilingual or diverse child. Therefore, norm-referenced tests are not valid or reliable instruments for diagnosis of language disorders in bilingual or diverse children. They often also contain significant linguistic, cultural and SES bias.


Validity Part 1: Construct Validity

Validity Part 2: Validity, SES, and the WISC-IV Spanish

Validity Part 3: ELLs, IQs, and Cognitive Tests

NYCDOE Initial Guidance Document for Speech and Language Evaluators


Dollaghan, C. A., & Horner, E. A. (2011). Bilingual language assessment: A meta-analysis of diagnostic accuracy. Journal of Speech, Language, and Hearing Research54, 1077-1088.

Spaulding, T. J., Plante, E., & Farinella, K. A. (2006). Eligibility criteria for language impairment: Is the low end of normal always appropriate? Language, Speech, and Hearing Services in Schools, 37, 61-72.