-- a Gerald Bracey Report on the Condition of Education | ||
EDDRA Articles |
THOSE MISLEADING
SAT AND NAEP TRENDS: Gerald
W. Bracey is an Associate of the High/Scope Educational Research Foundation
and an Associate Professor at George Mason University.
His most recent books are The
War on America's Public Schools (Allyn & Bacon, 2002) and Put to the Test: An Educator's and Consumer's
Guide to Standardized Testing (Revised edition, Phi Delta Kappa
International, 2002). The opinions
are his own. The average SAT Verbal score in 2002 was precisely the same as it was in 1981, 504. Yet, each of the six major ethnic categories used by the College Board shows an increase in that period of time: whites, 8 points; blacks, 19; Asians 27; Puerto Ricans 18;and American Indians, 8. How can it be, then, that all groups that make up the national average have gained but the national average score has not budged in 21 years? This is not a trivial question: critics of schools have used these national averages as indicators of no progress in education reform. To under stand it, let's look first at the trends for the SAT both Verbal and Mathematics for the various ethnic groups and for all groups lumped together. 1981* 2002 Gain 1981 2002 Gain Verbal Mathematics Black 412 431 +19 391 427 +36 Asian 474 501 +27 512 569 +57 Mexican 438 446 +8 447 457 +10 Puerto Rican 437 455 +18 428 451 +23 American Indian 471 479 +8 463 483 +20 All Testtakers 504 504 0 494 516 +22 What on earth is going on here? The increase in Math scores for the most ethnic groups exceeds, and sometimes far exceeds, the gain for all students. The Verbal scores show an even more paradoxical outcome: All groups show an increase, but the gain for the whole group is exactly zero. Nil. To understand how Simpson's paradox affects SAT averages over time, we must look at changes in the ethnic composition of the SAT testtaking group over time. Table 1 below shows these changes.
Table 1 1981 2002 # % # % White 19,383 85 698,659 65 Black 75,434 9 122,684 11 Asian 29,753 3 103,242 10 Mexican 14,405 2 48,255 4 Puerto Rican 7,038 1 14,273 1 American Indian 4,655 0 7,506 1 (2002 percentages do not sum to 100% because of 8 percent responding "Latin American" or "Other," response categories not used in 1981). The changing composition of the SAT testtakers causes the paradox. Minorities now comprise a much larger proportion of the total than they did 20 years ago. And, except for the Mathematics scores of Asians, all minority scores, while rising, remain below the overall average. Adding more and more of these improving, but still low, scores attenuates the rise of the overall average. In the case of the verbal score, it attenuates it to zero. Simpson's Paradox is stated in many ways. They all convey the idea that when subgroups' scores on a variable are aggregated to form a single total group, the total might show a relationship that is the reverse of the relationship seen in the subgroups. Hence, the paradox. In the above example, Simpson's Paradox strikes because the composition of the whole group changes over time: many more minorities in 2002 than in 1981. Simpson's Paradox also affects one-time measurements where the subgroups differ in some important way from the whole group. The following medical example shows how this happens. If we compare survival rates for patients in two hospitals, overall the results look like this: Survived Died Total Survival Rate Hospital A 800 200 1000 80% Hospital B 900 100 1000 90% Hospitals are dangerous places generally, but it looks like if you must check into one, Hospital B is your medical facility of choice. But what if we divide the patients into those who were in good condition prior to treatment and those who were in poor condition? Survived Died Total Survival Rate Hospital A 590 10 600 98% Hospital B 870 30 900 97% Poor Condition Patients Hospital A 210 190 400 53% Hospital B 30 70 100 30% Thus while Hospital B had a higher survival rate for all patients than did Hospital A, Hospital A treated a higher proportion of those who were in bad shape to start with. It also managed to keep a higher proportion alive. Hospital A is the place for you whether you are in good or poor condition on your arrival. NAEP Reading 1971 1999 Age 17 285 288 Age 13 255 259 Age 9 208 212 Over a period of 28 years, scores change little. "NAEP reading scores are essentially unchanged," said Right-wing pundit, George F. Will in his March 2, 2003 column. "This refutes the durable delusion that schools' cognitive outputs vary directly with financial inputs." This is a common comment from the Right. Spending has increased ("soared," "skyrocketed," "mounted" are words commonly used by the critics), but test scores are "flat." ("stagnant," "sluggish," "static," choose your term). As with the SAT, though, looking at trends by ethnic group reveals something different than just looking at aggregates for all groups: Reading White Black Hispanic 1971 1999 1971 1999 1975! 1999 Age 17 291 295 238 264 252 271 Age 13 261 267 222 238 232 244 Age 9 212 221 170 186 183 193 ! Hispanics constituted too small a sample to generate a reliable estimate in the 1971 assessment. Asians were still too small a group in 1999. The changes for white students pretty much mirror the changes for the whole sample. The gains for black and Hispanic students, though, are much larger than for the entire group. However, their scores remain lower than whites and, by Simpson's Paradox, because they are now a larger proportion of the total group, they attenuate the gains seen when all groups are combined. The proportion of whites in the sample falls from roughly 80% to roughly 70% (it varies slightly for different ages). The proportion of the entire group made up of blacks changes over time from about 14 percent to about 16%, while the proportion of Hispanics doubles from about five percent to about 10 percent). Asians were not represented as a separate group until the science assessment of 1996 and even in that year there was concern about the accuracy of the estimated scores. 2. 500 510 3 500 510 4. 500 510 5. 500 510 6. 500 510 7. 500 510 8. 500 430 9. 500 430 10 400 430 ____ ____ Avg. 490 486 ----- *(1981 is used as a starting point because it was the first year the Board published a document showing SAT data by gender and ethnicity. Coincidentally, 1981 also marked the lowest point of the decline of average SAT scores that had begun in 1963. The Board category, Latin American, which covers Central and South American students, was not in use in 1981 and currently accounts for four percent of all SAT testtakers. They scored 458 on the Verbal in 2002 and 464 on the Math. Another four percent now check "other," also not used in 1981 and also account for 4 percent of the total. They scored 502 on the Verbal and 514 on the Math.).
|
|
|
© 2003 Gerald Bracey | Web Services by | |