What follows is a guest post by Dan Goldhaber on L’affaire Sanders:
Having spent the past 5 years or so looking pretty carefully at the impacts of NBPTS certification on teachers and students in North Carolina, I’ve been following the trials and tribulations of NBPTS and the Sanders Three with great interest. Thus, I was happy to oblige when Andy (who clearly needs a nickname – suggestions welcome) generously offered me the opportunity to weigh-in on the debate.
What bemuses me about the entire situation is the way in which the findings from my research work with Emily Anthony  and The Sanders Three  have often been reported (though not on this blog): As if they directly conflict with each other. It’s endemic in the press and policy circles to boil research down to its simplest form — thus, when findings are shorn of their original nuances and caveats (remember, we are talking about researchers, so there are lots of these) and condensed into sound bites, they can frequently appear to be contradictory when in fact, that may not be the case.
So what are the differences between our respective studies?Let’s begin with the data. There are several differences in the study samples: the Sanders Three focus on two large districts from 1999-00 to 2002-03, while we use a sample that includes all districts in the entire state from 1997-98 to 1999-00. The Sanders Three examine grades 4-8, whereas we focus on grades 3-5.There are also some differences in our methodologies and variables used (we included quite an extensive set of student, teacher, and school variables that we believed to be correlated with student achievement) as well as the comparisons we make. It is on this point that I take issue with the Sanders Three paper, as it claims that their hedonic linear modeling (HLM) methodology is superior. HLM is a terrific methodology to use when accounting for the nested nature of educational data (in other words, it can account for the fact that students are clustered in classrooms, so an event that effects one might effect all of them), but it’s not the only methodology that does this, and it does have some shortcomings. In particular it does not allow for the possibility that the error term is correlated with included variables. This is quite likely given that teachers are not randomly matched to students. For example, parents who help and encourage their children at home may also get them into “better schools” or assigned to “the best” teachers. A significant amount of research does in fact show that higher achieving students are more likely to be taught by more credentialed and experienced teachers. That said, correcting for this possibility (which Emily and I do with school and student fixed-effects models) tends to decrease the magnitude of the NBPTS effect (implying that it is unlikely that the Sanders Three non-NBPTS findings result from a failure to address this potential source of bias).
So what’s the bottom line here? There may well be differences between the two sets of findings, but it’s hard to tell given that we focus on different samples and employ somewhat different methodologies. My guess is that the differences are more a question of emphasis than of substance. Emily and I find more “statistically significant” NBPTS effects, but it’s important to remember that a statistically significant finding doesn’t mean that an indicator is going to be a perfect predictor of an outcome or that statistical significance equates with substantive significance in policy. College graduation is certainly a predictor of future earnings, but there are a lot people who didn’t graduate from college with high salaries, and there are also plenty of unemployed college graduates. Similarly, a statistically significant NBPTS finding does not mean that all (or even a large majority of) National Board Certified Teachers will necessarily be more effective than non-NBCTs.
That said, I believe that the Sanders Three emphasis on the within-category (NBPTS vs. non-NBPTS) variation in teacher quality and the overlap in the distribution of teacher effectiveness is absolutely correct. I conducted some follow-up research (which will be published this summer in Education Finance and Policy) that also found a significant amount of overlap. Specifically, I found that NBPTS teachers would be predicted to outperform non-NBPTS teachers around 55-60% of the time—better than a flip of the coin, but certainly no slam dunk when it comes to choosing the better teacher. This issue of overlapping distributions is coming up more and more frequently (see, for instance, Gordon, Kane and Steiger, 2006, or Goldhaber, 2006) and it ought to inform the way we view teacher policies.
So, is it worth investing in the NBPTS model? I think it’s still too soon to say but I’ll leave that to the policymakers. But I do want to suggest that we try to move the policy discussion beyond the well-worn ‘NBCTs are better/worse than non-NBCTs’ debate. We ought to be concerned about the effects on different types of students, whether NBCTs have effects on students other than those in their classrooms, and whether there are impacts apart from those we can detect based on student test scores. There is a great deal of research on NBPTS that should be coming online soon, which will hopefully shed more light on the value of this credential in different teaching contexts. It might also provide information that the National Board can use to improve their process. Coming back to my original point, we would probably be better off if the policy debate moved from it’s “good” or “bad” to “how can it be made better?” –Dan Goldhaber