Blog: The Educated Reporter

Measuring Teacher Effectiveness—Stories and Ideas

On Tuesday, the Bill & Melinda Gates Foundation will release the final report of its three-year study on teacher effectiveness. The Measures of Effective Teaching (MET) project relied on 3,000 teacher volunteers and dozens of independent research teams to produce a set of standards intended to help policymakers and school leaders improve the teacher corps.

The push to better understand the skillsets strong teachers require stems in part from the tough-to-swallow teacher evaluation results that have been gathered thus far, which critics argue are too lenient on educators. A study of Michigan teachers found that over 99 percent of them were rated effective or highly effective last year. In 2010, Secretary of Education Arne Duncan called for more rigorous assessments of teacher effectiveness, noting, “Today in our country, 99 percent of our teachers are above average.” A report that year concluded about as much, finding that less than one percent of teachers received an ineffective rating.

[See EWA Story Starters on Teachers and Leadership & Governance]

The report’s findings:

  • The Wall Street Journal writes “The three-year study by the Bill & Melinda Gates Foundation, published Tuesday, is the first large-scale research to show, using random student assignment, that some teachers can produce test-score gains regardless of the past performance of their students, according to foundation officials.” ;
  • The Huffington Post notes that ”Moreover, the report found that overall, classroom observations — the way most teachers around the country have been evaluated for decades — are highly unreliable on their own. ‘It is clear from these findings and the MET project’s earlier study of classroom observation instruments that classroom observations are not discerning large absolute differences in practice,’ the authors wrote. They found that counting observations for half of the total score is ‘counterproductive.’” ;
  • Education Week spoke to a few critics of the most recent MET report who said ”‘These results could still be based on a very selective group of teachers,’ said Jesse M. Rothstein, an assistant professor of economics at the University of California, Berkeley, who has often been critical of the MET findings. ‘I would love to see a lot more investigating of just who was and wasn’t complying, and why they were left out.’ Douglas N. Harris, a professor of economics at Tulane University, in New Orleans, added that the study didn’t address some other potential sources of bias. For instance, the study’s authors also acknowledge that the experiment is limited to comparisons of teachers within, but not across, schools. ‘There are a lot of ways in which there could be a nonrandom assignment of students to teachers,’ Mr. Harris said. ‘They’re studying some elements of that, but not others.’”

Background on the MET Project

Like most projects spearheaded by the Gates Foundation, the MET is not without detractors. Research groups sympathetic to teacher unions and those skeptical of data-driven metrics to evaluate teacher performance have criticized the Gates Foundation effort.

The National Education Policy Center accused the first MET Project report of predetermining its results in a January, 2011 response.

Matthew Di Carlo of the Albert Shanker Institute put both write-ups in perspective:

Despite my disagreements with some of the Gates Foundation’s core views about school reform, I think that they deserve a lot of credit for this project. It is heavily resourced, the research team is top-notch, and the issues they’re looking at are huge. The study is very, very important — done correctly.

On first read, it might sound like Rothstein is saying that the MET researchers cooked the books. He isn’t. In the very next sentence, he points out that the MET project has two stated premises guiding its work— that, whenever feasible, teacher evaluations should be based “to a significant extent” on student test score gains; and that other components of evaluations (such as observations), in order to be considered valid, must be correlated with test score gains.

And over at Education Week’s Living in Dialogue blog, writer John Thompson laments the fervor with which lawmakers have instantiated Gates’ findings without the caution that usually guides big changes in public policy:

It is especially hard to understand why the Gates Foundation’s opinions about “teacher quality” have already been imposed on urban schools [emphasis his]. Three years after the Gates’ theories became law in many states, the MET will issue a final report on “the most vexing question we face [which] is whether or not our results were biased by the exclusion of important student characteristics from the value-added models.” The MET sample of students was only 56% low income with 8% being on special education IEPs, so it is even harder to see how it could provide evidence relevant to schools serving intense concentrations of extreme poverty. Moreover, it seems that the MET’s economists have overlooked the likelihood that value-added will drive the top teaching talent out of the schools where it is harder to meet test score growth targets.

But supporters of the MET findings are quick to note the changes they recommend affect more than just teachers. Thomas Kane, one of the lead researchers on MET Project (Kane appeared at EWA’s National Seminar in Philadelphia last year where he thread a needle through the project’s focus on student achievement gains, classroom observations, and student feedback) wrote recently that quantifying the qualitative aspects of a teacher is an obligation researchers and policy makers have to taxpayers and students:

We expect schools to do more than raise achievement on tests, however. Parents hope their children will learn other skills that lead to success later in life, such as an ability to work in teams and persistence. Just because these skills are hard to measure and are not captured directly on any state test need not imply that effective teachers are ignoring them. Indeed, building student persistence may be an effective strategy for raising achievement on state tests. Recent evidence suggests that the teachers with larger student-achievement gains on state tests also seem to have students with greater long-term career success.

Labor groups, sensing among lawmakers an inexorable pivot toward data-driven metrics, have grudgingly yielded to lawmaker demands that value-added measures be used to assess teacher performance. The pitched battle in Chicago between Mayor Rahm Emanuel and the Chicago Teachers Union ended with the labor group accepting a contract in which 30 percent of a teacher’s evaluation would be based on student test scores—the minimum percentage permitted under Illinois law. 

Other contract negotiations in recent months have been more conciliatory, with school districts in San Jose and Los Angeles tentatively agreeing to more value-added scrutiny (though critics of the L.A. agreement argue the agreement uses a weaker formula for flagging low-performing teachers). Still, a report in 2012 suggested teachers are warming up to value-added measurements as part of their review process.

One form of teacher assessment avoiding the usual headwinds so far involves students’ grading their instructors. Students have been shown to predict teacher effectiveness better than more controversial measurements like value-added. EWA and The Atlantic magazine have separate write-ups on this trend.

Perhaps the final piece to the puzzle of improving teachers is boosting the caliber of talent graduating from the nation’s teacher colleges. Marc Tucker, an expert on international education systems, wrote last year that unlike most other advanced countries that have “moved their teacher education programs out of their third tier institutions into their major research universities,” that’s “something we have never considered doing.”

Competing efforts have emerged to hold teacher colleges more accountable. Efforts like edTPA and a “bar exam” for teachers have been backed by labor-friendly groups; other projects aim to scrutinize those colleges using tools made popular by U.S. News & World Report and its college ranking team.

Meanwhile, federal support for teacher effectiveness has been large in funding but low in demonstrable results, as this EWA blog post points out.

Photo credit: Mikhail Zinshteyn, EWA