EDDRA


Education Disinformation Detection and Reporting Agency


-- a Gerald Bracey Report on the Condition of Education


Index of
EDDRA
Articles

SMALL CLASS SIZE: 1,

ERIC HANUSHEK: 0

SMALL CLASS SIZE: 1,

VOUCHERS: 0

These two boxscores are occasioned by two recent papers issuing from economists at Princeton University. In the first, Alan Krueger demonstrates compellingly that Eric Hanushek is dead wrong when he says money is unrelated to achievement. Hanushek's own data clearly show that money matters. In the second, Cecilia Rouse suggest that if students using vouchers get higher test scores, the reason might well be that they had smaller classes in smaller schools than did their peers in public schools.

Hanushek claimed in a 1989 Educational Researcher article that "there is no strong or systematic relationship between money and achievement." More recently (1997), he has said "there is no strong or consistent relationship between school inputs and student performance."

These statements have always caused some of us to cock our heads in puzzlement, because Hanushek's primitive methodology cannot find a strong or systematic relationship, or find a lack of one, for that matter. As Krueger says in his new paper, "Hanushek never defines the criterion for a strong or consistent relationship…." Keith Baker had earlier made a similar point in the April, 1991 Kappan.

Krueger conducts a more exacting analysis than did Baker. Although Hanushek's methodology is referred to as "vote counting" he does not take a one-study-one vote approach. He treats each analysis as an independent estimate of the effects of class size (hereafter referred to simply as "estimates"). Thus, if a study analyzed class size effects for elementary schools, that study gets one vote. But if it analyzed class size effects for grades 1-6 separately, the study gets six votes. If it conducted its analysis for each grade and also for gender and for each of the four largest minority groups, that study--the same study--gets 48 votes (6 grades X 2 genders X 4 ethnic groups). Krueger observes that Hanushek provides no theoretical justification for such treatment. In any case, Hanushek obtained 277 estimates from 59 studies.

Krueger found that the studies In Hanushek's dataset that generated large numbers of estimates also generated most of the negative estimates. Where the study yielded only one estimate, 71% were positive, 22% were negative and 7% were unknown. The studies yielded eight or more estimates produced 57% negative estimates, 19% positive, and 24% unknown. This is peculiar.

Krueger reanalyzed Hanushek's data (which Hanushek had provided), giving the experiments equal weight--one study, one vote. This alone reverses Hanushek's claim that money doesn't matter. As noted, the methodology doesn't permit the detection of subtleties such as "strong and consistent", but clearly there are more positive studies than negatives.

Krueger then takes a look at the nine studies that generated the most estimates. The results are not pretty. Many studies have little to do at all with class size and many are methodologically problematic, to say the least. About one study that generated 11 estimates, Krueger declares "Class size is just an ancillary variable in a kitchen-sink regression…." A mere nine of the 59 studies generated 122 or 44% of all estimates used. Two studies alone generated 48 estimates. These studies, both by the same authors, carry the titles, "The Merits of a Longer School Day" and "Classmates' Effects on Black Student Achievement in Public School Classrooms." Boy, they sure don't look like direct tests of the impact of class size on achievement to me.

Krueger doesn't comment on one characteristic of these nine studies that jumps out at me: virtually all of them are from the middle and high school years. Everyone I know who is interested in using class size to improve achievement has recommended class size reduction only in the earliest grades.

Perhaps this is the place to invoke, as Krueger does, a quote from Galileo:

I say that the testimony of many has little more value than that of few, since the number of people who reason well in complicated matters is much smaller than of those who reason badly. If reasoning were like hauling I should agree that several reasoners would be worth more than one, just as several horses can haul more sacks of grain than one can. But reasoning is like racing and not like hauling, and a single Barbary steed can outrun a hundred dray horses."

Krueger then proclaims that "insofar as sample size and strength of design are concerned, I would argue that Tennessee's Project STAR is the single Barbary steed in the class size literature." Krueger is too polite to specify what Hanushek's studies generally and these nine in particular are.

Project STAR was Tennessee's experiment where within each participating school, students were randomly assigned to one of three conditions: regular class, regular class with full-time aide, and small class (13- 16). Students in small classes had higher test scores and these were sustained into the high school years. In another paper Krueger found that students who had spent their early grades in small classes were more likely to take the ACT or SAT college admissions tests. The impact was especially large for black students (an indication of Krueger's thoroughness: although most STAR students remained in Tennessee, with the cooperation of the college admissions testing companies, names of students in project STAR were matched with names of students taking the SAT and ACT tests in the entire country).

So what does Hanushek make of the "single Barbary steed?" Nothing. That's right. Nothing. Project STAR has never been a part of any of Hanushek's analyses. Why? If you are not seated, please take a chair before reading further: Because it doesn't control for family background characteristics. Random assignment is the sine qua non for breeding a Barbary steed. It is what researchers yearn for, not the kinds of statistical adjustments that are made in most studies, including those in Hanushek's sample. One of the studies above that generated 24 estimates did not control for any family background factors, but Hanushek used it anyway. But Project STAR, which controls for background variables the best way possible is not included.

In fact, Hanushek makes virtually no attempt to include any data from educational research journals, only economic journals. One might, though, take a more jaundiced look at his failure to include a study like Project STAR that has received so much attention: Using Hanushek's methodology, Project Star would generate 75 estimates. Thus, the inclusion of Project Star alone would reverse Hanushek's conclusions.

Although Hanushek established clear criteria for allowing a study into his analysis, he sometimes ignored them and exercised considerable discretion over what got in and what was eliminated. One Hanushek criterion is that the study must have been published in order to be included. Says Krueger, "In a small number of cases, estimates were misclassified and unpublished estimates selected. Kiesling, for example, was classified as having three estimates of the effect of class size, but there is no mention of a class size variable in Kiesling's article. Hanushek informed me that [he took] Kiesling's estimates from his unpublished thesis, which seems to violate his intention of using [only] published estimates."

Krueger notes that 20% of the studies used in Hanushek's sample do not report the sign of the estimate, something Krueger considers likely to indicate a low quality study. He thus conducted another analysis, weighting each study by the "impact factor" of the journal that published it. Unpublished studies received the same weight as the lowest ranked journal. "The impact factors are based on the average number of citations to articles published in the journals in 1998." When Krueger had weighted all studies equally, the ratio of positive to negative was 1.57 to 1. Weighting the studies for journal impact factor produces a ratio of 1.72 to 1. There are five times as many positive and statistically significant studies as negative and statistically significant (34.5% to 6.9%)--and keep in mind, Krueger here is using the dray horses of the Hanushek database.

In another paper, Krueger notes that Hanushek's analyses have been influential in two arenas: "First Hanushek has testified about his literature summaries in the school financing cases in Alabama, California, Missouri, New Hampshire, New York, Maryland, New Jersey, and Tennessee, and in several congressional hearings." Happily, he hasn't always been successful. The Supreme Court of New Jersey concluded "We return to the plaintiffs' insistent and persuasive question: If these factors [e.g., smaller classes] are not related to the quality of education, why are the richer districts willing to spend so much for them?"

Krueger annotates: "In view of the shaky statistical ground on which Hanushek's literature summary is based, and the qualitatively different results obtained when more plausible [analyses are conducted], this strikes me as a sensible objection."

Hanushek has also influenced the call for vouchers. Hanushek feels schools need a change in incentive structure to make money effective. Krueger demurs: "Before profound changes in schools are made because of a presumed and in my view inaccurate conclusion that resources are unrelated to achievement, compelling evidence of the efficacy of the proposed changes should be required."

One other Hanushek flaw that Baker, but not Krueger mentioned: Hanushek's data concern the level of achievement, something known to be strongly influenced by family and community variables. His recommendation, though, concerns changes in achievement, something that is much less dependent on non-school factors. His findings, even if they were valid, are not logically connected to his policy recommendation [I'M CHECKING WITH KRUEGER ON THIS--I DON'T THINK KEITH HAD THE INTIMATE FAMILIARITY WITH THE STUDIES THAT KRUEGER DOES].

Rouse, for her part, reviews the results from the Milwaukee voucher program and from a more recent voucher undertaking in New York City. In fact, Rouse cautions that its recency means we need to be tentative about the findings. The findings reported to date are only from the first year of the experiment and the landscape of educational research is littered with the corpses of studies unable to sustain first-year effects.

Rouse points out that the schools attended by voucher students differ in numerous ways from those attended by public school students. They had fewer special education programs and fewer programs for limited English speakers. Voucher schools were less racially segregated, had less fighting and truancy, and parents more closely in contact with teachers. Perhaps more importantly, they were 30% smaller and had smaller classes. "Overall, it appears that students who were offered a voucher attended better quality schools along several dimensions."

Rouse allows that "there is limited evidence that the achievement of students in the voucher schools may have increased….(emphasis in the original)" However, although the experiment supposedly assigned students randomly to the two groups, the kids who went to public schools scored higher in both reading and math at the start of the experiment. Under conditions of randomization, such systematic differences should not exist and "raise concerns about the validity of the experiment and/or the quality of the data."

In Milwaukee, Rouse does find some advantage for African-American students in choice schools, but also for a group of public schools that received extra funds. Why? Again, she suggests the smaller class size of these schools had something to do with it.

More generally, though "the research presented here suggests that we must develop a much better understanding of what makes schools effective. The class size effects from Project STAR differed across schools and so did the results in Milwaukee. And yet, we have little understanding of why they did so."

Both papers are working papers of the Industrial Relations Section and accessible at www.irs.princeton.edu/pubs/working_papers.html. Krueger's paper is #447 posted in September, 2000, Rouse's, #440 posted in June.

Posted 10/16/2000


© 2000 Gerald Bracey
Web Services by