Brill Responds

Per the post below where the American Federation of Teachers’ talking points memo accuses him of fabricating quotes in The New York Times Magazine, Steven Brill writes to say:

If I were going to “fabricate” a quote, why would I pick one that is so unsurprising? After all Ms. Weingarten’s own website quotes her on March 13, when President Obama announced plans to include more Race-like contests into traditional federal school aid, as declaring that teachers “should be empowered and supported—not scapegoated. [Emphasis mine.] We are surprised and disappointed that the Obama administration proposed this as a starting point for reauthorizing the Elementary and Secondary Education Act.”

In fact, in my initial draft of the article I had that quote in there, too, but took it out because it was semi-redundant with the one Ms. Weingarten now disputes — and which my notes (written on a reporter’s pad while standing as we left a restaurant) reflect.

That this quote is not reflected in Mr. Powell’s notes is no surprise to me; I didn’t see him take any, and he certainly wasn’t taking them as we were leaving a restaurant and Ms. Weingarten, after remarking that Race to the Top “doesn’t really involve that much money,” added her point about President Obama. In fact, as it happens, when I talked repeatedly with her last summer for the article about Rubber Rooms and I mentioned the upcoming “Race,” she said almost exactly the same thing, as my notes then also reflect.

I appreciate all of the time that Ms. Weingarten and her colleagues gave me in my preparation for this article, and I regret that they feel compelled to challenge this quote. However, I am gratified that they did not challenge any of the other reporting in the article and look forward to Ms. Weingarten’s cooperation and her keen willingness to engage candidly on these issues in the future.

Steve Brill

10 Replies to “Brill Responds”

  1. Mr. Brill,

    I’m curious what your notes said regarding the paraphrase “linking student progress to testing will be fair because tests don’t take subjective factors into account, nor would allowing subjective evaluations by principals be fair.”

    Why would Ms. Weingarten rest her case on the subjective factors lacking in test growth models when there is overwhelming objective, scientific evidence that they are not valid for evaluation purposes? As explained by numerous scientific organizations, VAMs rely on test scores in an inappropirate way, they only work for homogenous populations, and grow more invalid in the segregated worlds of urban secondary schools. They do not allow for factoring out peer effects or the increasing difficulty of reversing the decline in performance of students as they age when they can’t read for comprehension.

    The lack of enforcement of behavioral and attendance standards, and the coercing of teachers into just passing kids on, are the results of ADMINISTRATIVE policies. Yet we are supposed to hand over this flawed data to administrators.

    I just reviewed my year’s class rolls and saw that I had between 18 to 30 mentally ill students among my 210 student load that also included numerous felons, and where the majority were on IEPs and ELLs. During our annual meltdown during high-stakes testing we had four teachers assaulted in three weeks, and many more cussed out during that time, and the aggressors were just given a $260 ticket for disturbing the peace and sent back to class.

    And how do you factor the effects of gang wars and funerals into the estimating of a teacher’s growth target?

    Yet you want a statistical model devised in the homogenous world of 5th grade math, before the Big Sort really takes off, to calculate my growth target. You might as well create a growth target for my kids (where we are 49th in education spending) based on test scores in Massachusetts.

    If “reformers” wanted a deal, The Grand Bargain makes sense, turning flawed data over to peer review committees.

    But we see from your Ed Week notes what “reformers” think about peer review when peers read the evidence differently.

    And that’s crucial. You “reformers” are afraid of the evidence driven debates of social science, so you use sound bites instead.

  2. John,

    All your points are good points, as usual. But I’m wondering:

    1. Do you think there is, or will ever be, a good way to evaluate teacher effectiveness? (And what would that way be?)

    2. Do you think teachers are a “special case population” whose efforts and situations are so varied and so complex that they cannot be evaluated under any circumstances? (And, if so, why do you think the work of teachers is harder to evaluate than, say, that of doctors, lawyers, firemen, policemen, emergency service workers, engineers, waiters, bartenders, sanitation workers, etc.?)

    I really trust your judgment as you know. And, like many people in the country, I am trying to figure out what I think about the general issue of teacher evaluation. Every person I know who isn’t a teacher and who has a job, gets evaluated in their work. And teachers certainly spend a lot of time evaluating kids. So what I’m imagining is that there has to be some reasonable way to evaluate teachers. But I defer to your experience and expertise on the issue and would really like to know what you think a good evaluation system might look like if you think such a system could exist.



  3. Steve,

    The challenge would be much easier if we a) stopped the blame game, and b) concentrated on removing the bottom 8 to 10% of teachers. Then we could build on those successes.

    I’ve never worked with a principal who wasn’t so overwhelmed that they didn’t go months at a time without thinking about classroom instruction. And why take the time to process the most obviously incompetent if you probably can’t find a replacment. So the problem is the horrific conditions in our toughest schools that chews up and spits out administrators, teachers, and students.

    In my experience, people want to do good. They want to work collaboratively. They’d love to have respectful relationships with each other and with the kids and parents. And for examples of models that draw on the best, and the most toughminded of human nature, readers should follow your links. Steve, you have more practical knowledge of these policy approaches than I do.

    Honestly I wish that growth models had never been invented. But I’d go with The Grand Bargain. Dealing with flawed data in peer reviewing teams is what adults do.


  4. John,

    Just to follow-up – what does that recommendation mean in a practical sense?

    I’ve seen models ranging from the current – 100% of an evaluation based on principal evaluation of a teacher to ones that are pushing for 50% + of an evaluation being based on student performance gains from a teacher.

    I’ve also heard about some evaluations that bring in expert evaluators from outside the school to observe teachers and rate them against a Danielson rubric.

    Which elements would you include? What process would you encourage? It sounds like “data-informed accountability” would include the test score piece as one of a number of other metrics. Would that be the way to get rid of the bottom 8-10%? Would it be strong enough to split the other 90% into performance buckets to potentially reward them differently (pay and autonomy)?

    I see your points, but I don’t think the status quo is tenable and so I’m trying to figure out what makes the most sense.


  5. How does one evaluate by test scores those teachers who teach a subject that is not tested?
    Eg, Social Studies, Art, Music, computers, foreign language etc.
    Will “reformers” expect that schools will add national standardized tests in those subject areas?

    Oh, I take it that Mr. Brill did get the ELL and other data about Harlem school wrong?
    He hasn’t rebutted it.

  6. I’m not an educator but I wanted to comment. I worked in the printing industry for 10 years and was never evaluated formally. I worked as a newspaper reporter/editor for 15 years and was formally evaluated only by one empoyer. The evaluations were based on his closely observed impressions of my demeanor and output, both of which he knew from daily personal experience. I’ve worked for a state agency for five years and been formally evaluated based on close knowledge of my work and demeanor. None of those evaluations used data. None were based on my effect on other people.

    Are there good analogs in the private sector for what teachers do? I’m not sure. I’m asking an open question, not being sarcastic.

    You can judge some salespeople on sales. That’s not the same. And with education you have the tricky point that teachers can teach but only students can learn. Lead a horse to water, etc.

    If you’re going to hold teachers responsible, then you have to give them all the authority to control the conditions, at least in school, and we don’t do that.

    At the same time, as a parent I see teachers who refuse to see that state assessments are telling them something true about the students, which I believe they do, based on my kids. The state assessments are more accurate than the classroom grades, as I can see from daily interaction with my kids and their classwork.
    So what do we do?

  7. Dean

    I see two good ways, and one status quo-type way to use test scores. The first would be a long, complex, process of:
    a) determining how much confidence the system should have in growth models that are largely designed and tested in more homogenous elementary schools so that we’d have some sort of evidence of how much confidence we should have in evaluating each type of teacher in each type of school,
    b) training every evaluator in the limits of the data, the caveats, etc.,
    c) an appeals process, and
    d) enough litigation to get us to the point where we can make informed judgements.

    Or the better way would be to present the data to a committee of administrators and teachers from the school in question, who know the overall realities, and use evaluations as a part of a collaborative process.

    The most likely scenario, however, would be to reinvent the statistical tricks of NCLB with everyone lying to each other so much that eventually the data loses all value.

    The problem with the first and third scenarios is that they would encourage even more test prep and lowest common denominator instruction.

    In terms of performance pay, the stakes aren’t as high. If I have a 15% chance per year that I get more or less of a bonus, that isn’t the same as a 15% chance per year of your career being destroyed or damaged. Perhaps the biggest harm is that under performance pay teachers will see and argue over miscarriages of justice every year while under data-driven elavuations, teachers will see the career of an effective teacher in the building being destroyed every year.

    That’s why collaboration needs to be built into either performance pay or test score evaluations.

    Most systems seem to be based on Danielson so I’d go with her. D.C.’s Impact, however, has Danielson and everything else but the kitchen sink and it seems particularly out of touch with the reality of the urban classroom. It used to be rare for me to have more than one kid per class with fullblown mental illness and now its not unusual to have a 1/2 dozen along with the armed robbers, gang members, kids on IEPs and ELLs. An former AP or IB teacher doing the evaluation needs to realize when entering many of today’s classrooms he or she is entering a different world.

    Ed is right about increasing the testing, thus increasing the abuses and dangers, just to fire people.

    And I still haven’t gotten an answer how the previous three years of “Science” or “Math” results on middle school tests focusing on arithmetic and facts is going to be used to create growth targets for Algebra I or Biology I. In our state we’ll soon have three or four different types of EOI tests all created by different types of committees and corporations. If you teach a class with a test designed during the early days of NCLB designed to be easy, are you home free? But if you teach a class with an America’s Choice EOI designed for real college prep, are you always going to get screwed, no matter how effective you are at transistioning kids from rote instruction to college prep enquiry?

    And as far as the weight of test scores, I don’t see that as more important than the context. We can’t have test scores DRIVE the process, where scores indict a teacher as ineffective. But scores could complement or supplement evaluations. That would be the essense of data-DRIVEN versus data-INFORMED.

    The Toldeo Plan without sing test scores consistently removes nearly 110% of teachers per year. I’d prefer it as the foundation of tougher evaluations, but I’d go with the Grand Bargain of the Toledo Plan using test score growth.

  8. In response to the Brill piece, it seems grown-ups are trying to explain away Harlem children’s successes. This is most disappointing and unfortunate. Here are the facts about our scholars and their demographics:

    Special education:
    1. PS 149 in fact tests FEWER special education students than we do at HSA.  They had only 3 children with IEPs take the 3rd grade test while we had 9 children with IEPs take the test in 2009.  (See PS 149 report card, page 14)  PS 149 either doesn’t have as many students with IEPs in 3rd grade or is not testing them.
    1. HSA actually tested more students classified as “economically disadvantaged.”  HSA tested 43 economically disadvantaged students while PS 149 tested 39.  True, as a percentage of the overall students tested, PS 149’s percentage is higher.  However, and this is a big however, poverty was not a determinant of our students’ performance.  Of our 43 “economically disadvantaged” students who took the test, 93% passed the ELA and 33% got “4s.”  Our “economically disadvantaged” students had a significantly higher percentage of “4s” than our not “economically disadvantaged” students.  In fact, while 100% of our not “economically disadvantaged” students passed the test, NONE got “4s.”  On math, poverty was also not a determinant in performance.  100% of our “economically disadvantaged” students passed the math test and 63% got “4s.”
    2. The stats she cites for poverty are incorrect.  If you look at the report cards for PS 149 and HSA — we have the SAME EXACT percentage of students eligible for free or reduced lunch — 70%.  They do have a higher percentage of students eligible for free lunch, however, they certainly do not have the 81% free lunch that the bloggers claim (it’s 68%).  And again, our students eligible for free priced lunch still aced the tests, so it’s really not an excuse.
    3. There are many schools you could compare HSA to that have far fewer economically disadvantaged students that nonetheless have far lower scores.  For example, PS 6 on the Upper East Side of Manhattan has just 10% free and reduced students (and 9.6% of 3rd graders tested are “economically disadvantaged”) and HSA still outperformed them.  

    1. PS 149 only had 2 Limited English Proficient (LEP) students take the 3rd grade test in 2009.  While HSA did not have any LEP students take the test, I don’t think that a difference of two students is significant enough to draw any major conclusions.  We do have LEP students taking the test this year, so we’ll be able to see in the coming months whether we were able to help LEP students pass the tests or not.  
    2. While we are not arguing with the point that we’ve had trouble attracting LEP students, we have for next year given preferential admission to them in the lottery, so I suspect the disparity will be gone next year.  

    1. We don’t track or report students in temporary housing so I don’t know where the first blogger would have gotten her information.  I know anecdotally that we have dozens of families across the network in temporary housing, but I unfortunately don’t have hard stats on this one.
    2. I can though say that the blogger’s assertion that PS 149 has 10% homelessness is false.  In 07-08, PS 149 had 476 students, 13 of which were in temporary housing.  That’s 2.7%.  Here’s the link to verify: [ ]
    While it doesn’t list 08-09 stats, it seems unlikely that their homelessness stats increased to 10% from 2.7% in one year.

  9. Thanks for a reminder of the good ol days before the complete proliferation of choice when my school only had 70% free and reduced lunch. I don’t begrudge Harlem Success or anyone else for their accomplishments. (our district which is 90% poor doesn’t have many schools with a 70% disadvantaged rate but every one of those schools is excellent.) Brill, however, doesn’t know how much harm he’s doing to the neighborhood schools with is tirades. Typically, I have as many at-risk students in my five classes serving 200 or so students as all 14 of our districts charters put together. But I’m not complaining, most of our middle school teachers have it worse. Evaluate us using test score growth and we’ll barely be able to find longterm subs.

Leave a Reply

Your email address will not be published.