Where Do Testing and Accountability Go From Here?

My Bellwether colleagues Alex Spurrier, Jenn Schiess, Andy Rotherham, and I released a set of briefs today looking at the past, present, and future of standards-based reform. Those include:

  1. In The Historical Roots and Theory of Change of Modern School Accountability, we review the history and logic behind standards-based reform to recall the foundational goals and rationale for the main strategic levers reformers were trying to pull.
  2. In The Impact of Standards-Based Accountability, we assess the strengths and weaknesses of the ways in which standards-based reform has been operationalized in policy and practice and begin to identify what should be retained and what should evolve.
  3. In Assessment and Accountability in the Wake of COVID-19, we explore what accountability may mean in a global pandemic, as challenges of equity in our education systems are exacerbated and the need to rapidly assess and address those challenges is urgent.

A forthcoming webinar will further explore these topics.

Join us on Monday July 20th for a conversation with Jeb Bush, John B. King, Jr., and Carissa Moffat Miller about how we should measure the impact of education systems on students, particular students of color and low-income students, even as COVID-19 changes schooling dramatically. Register and learn more here.

–Guest post by Chad Aldeman 

#EduFridayFive: A Conversation on Academic Skills at Kindergarten Entry with Christine Pitts

Last month a team of researchers published a new working paper looking at how the academic skills of entering kindergartners has changed over time. I reached out to one of the co-authors, Christine Pitts, a Research Scientist and Policy Advisor at NWEA, to answer five short questions about the project. Read her answers below:

How would you describe this project in 200 words or less?

Research shows that early math and reading skills are a strong predictor of future achievement. So, it is not surprising that state and local policymakers often use early childhood initiatives, albeit usually disjointed from K12 systems, to improve learning outcomes. Today, we are challenged because the most recent wave of nationally representative data on academic readiness at school entry, the Early Childhood Longitudinal Survey-Kindergarten Cohort (ECLS-K), was collected during the 2010-11 school year. However, there has been a myriad of changes in U.S. policy and society since then that likely affected children and the contexts attributing to their achievement upon kindergarten entry. The findings in this study provide initial evidence of trends in student’s academic achievement at school entry since the end of the ECLS-K dataset. While the findings are mixed and require additional exploration, there are three big takeaways:

  1. Kids entering kindergarten after 2014 performed slightly worse on math and reading standardized assessments than those who entered kindergarten during or before 2014.
  2. Racial/ ethnic and school poverty achievement gaps at school entry narrowed significantly, but modestly, from 2010-2017; with the most narrowing occurring in recent years.
  3. There was no relationship between district preschool enrollment and trends in achievement at school entry across time.

What would most people miss about this project if they only read the headline? 

While the findings of this study are powerful because they show an illustrative view of students’ academic trends over the last decade, we are aware that this is one descriptive slice of a broader, more comprehensive set of student outcomes with a host of related factors. Our hope is that after readers digest the big takeaways from this work, namely that kindergarteners in 2017 have slightly lower math and reading skills than in 2010, but inequalities by race/ethnicity and poverty have decreased, they will be asking questions like why, how, and what does this mean? As such, we are interested in braiding this line of inquiry with other pragmatic research agendas informing education leaders across the nation. We want our research and policy colleagues to see these findings as an invitation to partner on future studies where we can investigate the contexts and mechanisms underlying these downward trends and what this means for policymakers setting up the next decade of incoming kindergartners in America.

What compelled you to do this work? 

Early childhood is an incredibly important developmental period, but due to few well-developed measures of skills before and at school entry, it is hard to systematically examine how well different early childhood experiences prepare students for K12 achievement. My children each attended different preschool settings (e.g. in-home, private). Now that my oldest two are taking standardized assessments in the K12 system, we can see some of the differences in their academic development that may be attributable to their different preschool experiences. It is obvious to me from these data and my own experience in research and evaluation that our schools require standardized measures of early learning development to evaluate the impact of different early childhood experiences. But, during this past legislative session in Oregon a bill was introduced, the Too Young to Test bill, that aimed to remove all standardized assessments before third grade. This bill illuminated the common misconception and over-generalization that tests are always bad. Concurrently, we were discussing the utility of interim data collected at the beginning of kindergarten throughout the school year well into students’ elementary and middle school years for exploring broad trends on student achievement at school entry. It was clear that this study may provide an example about the need for high-quality measures of skills that span the ages of school entry and beyond.

What would a smart critic say about it, and how would you respond? 

Two explicit critiques have come up since presenting this study to our peer network. First, folks challenge us on the topic of testing early learners, especially upon their first experiences with the K-12 system. While there are a lot of important early skills that we did not examine, like self-regulation and social skills, the measure we used does a pretty good job of accommodating early learners through warm-up questions, audio instructions, and a visual interface designed to engage the youngest students. The other central criticism is that we have a non-random sample of the U.S. population that changed over time. Each year the cohort grew in size and became more racially/ ethnically diverse. We used a weighting procedure to (a) correct for the non-random sampling of those using our test and (b) yearly national proportions in school racial/ethnic breakdown, urbanicity, FRPL, and district socioeconomic measures. It is important to understand that even if our sample does not perfectly mirror the national population of US kindergarteners, it reflects a substantial portion of US kindergarteners, approximately one in every 10 kindergarten students between 2014 and 2017.

Other than this project, what are you most excited about right now?

Well, to be honest, I used to be an elementary school teacher and fall-time always makes me come alive with so many exciting changes. My kids are getting older and facing new challenges in school, ballots come out and we get to vote on exciting local initiatives, and it is noteworthy that we are embarking on a new strategic vision for our policy and advocacy work at NWEA. Our efforts will align key organizational priorities around equity and evidence-based policy. For example, I have the privilege of working strategically with our research team to explore how policy mechanisms can be highlighted within the research evidence that they promote through academic channels. For the current study, we crafted a research brief with key next steps for policymakers who are trying to evaluate the mechanisms underlying these descriptive trends at school entry. Other areas under study in our research department include summer learning loss and measuring social and emotional skill development, each lending themselves nicely to relevant national policy discussions. I encourage anyone interested to take a look at our research centers and explore research briefs on a variety of other topics.

–Guest post by Chad Aldeman

#EduFridayFive: The Collaborative for Student Success on Assessment HQ

Earlier this week the Collaborative for Student Success* released Assessment HQ, a new go-to resource for information on state assessments, including data and results from those assessments over time. To learn more about the project, I posed the #EduFridayFive questions to the Collaborative team. Read their answers below:

How would you describe this project in 200 words or less? 

The Collaborative for Student Success launched Assessment HQ to help build greater understanding about the role annual assessments play and how they are being used to advance educational equity and improvement in student achievement across the country. Don’t get us wrong, no one is saying annual assessments are perfect. However, they are an important tool to help educators and policymakers monitor the progress of all students in gaining the knowledge and skills they need over time. They are particularly important for those students who are most vulnerable or who historically have been underserved by education systems. We set out to create one place where individuals can find state-by-state student proficiency data, original commentary, resources, and state/national news all in an easy to navigate format.

What would most people miss about this project if they only read the headline? 

A lot of data is gathered from K-12 assessments, but sometimes it is not easy to access or it may not be clear what the data says. For the first time, student proficiency data for more than half of the states across the country is publicly available online, and in one location, for anyone to view and use. Assessment HQ highlights state-reported student performance results for grades 3-8, in mathematics and English language arts (ELA). The new site also allows users to see trends in student proficiency in individual states and observe the performance of student groups, like African American and Hispanic students. Only by exploring trend data on these students can we ensure they are making real progress.

What compelled you to do this work? 

Outside of the education policy community, it has largely gone unnoticed that the assessment landscape is constantly changing—with states adjusting vendors, new proficiency calculations, and debates on proper accountability. That constant change is often not in the best interest of students or educators. We’re hopeful that this platform will help cut through much of the uncertainty around tests by offering a clean, consolidated look at actual state-reported information. Let’s face it, tests are an easy punching bag. This site can contribute to a more informed dialogue around assessments and work to avoid situations where the politically expedient choice comes before what’s best for our students. 

What would a smart critic say about it, and how would you respond? 

Student proficiency data is one moment in time. It cannot – and should not – be used in isolation when viewing the work being done to help students succeed. Through the original commentary provided by Dale Chu on the Testing 1-2-3 blog, resources from state and national partners, as well as news coverage, we strive to provide the background information and nuance about the factors that contribute to assessment choices and results. We have assembled available state assessment data based on those states that have kept the same test in place for four years. We only needed three years to show a trend, but we set a higher bar for ourselves. With the goal of improving student success for every child, we must make sure that assessments are aligned to high standards, are informative to parents and policymakers, comparable, meaningful, and actionable.

Other than this project, what are you most excited about right now?

The Top Gun sequel…it’s been too long since Maverick did his thing!

Seriously though, the Collaborative has always been focused on its role as a nonpartisan player that is dedicated to ensuring that students are held to high standards and have the resources and supports necessary for their success. Funding and resources are a perennial topic in education and we’re currently very interested in the move states are making to report out per pupil spending at the school level vs. just at the district level. This provision in the Every Student Succeeds Act hasn’t received a ton of coverage, but it’s a significant step forward in understanding how education dollars are being spent. Also, we’re following the development of needs-assessments as states develop Career and Technical Education plans in coordination with local and regional business leaders.

–Guest post by Chad Aldeman

*Disclosure: The Collaborative is a client of Bellwether’s, although not on this project. 

Introducing #EduFridayFive: A Conversation on the State of Assessments with Bonnie O’Keefe

I’m pleased to introduce a new recurring feature today, an education-focused “Friday Five.” We’ve created a standard set of five questions, and we’ll ask guests to briefly respond, in their own words, about their work. The goal is to hear from interesting people across education who are leading new initiatives or research projects. You’ll see us using this format occasionally here on Eduwonk and at Bellwether’s group blog Ahead of the Heard.

For the series launch, I reached out to Bellwether Associate Partner Bonnie O’Keefe. Bonnie is the co-author, along with Bellwether Analyst Brandon Lewis, on a new paper about the future of state assessments. State assessment policy is at a critical juncture, and the national conversation has not yet caught up to some of the innovations playing out in the states. You’ll have to read the full paper to understand the whole picture, but what follows are Bonnie’s answers to the Friday Five:

Bonnie O'Keefe

How would you describe this project in 200 words or less? 

There are lots of opportunities available to states to improve and innovate their assessments under current federal law, but states don’t seem to be taking them. We look at the reasons why, and lift up some examples of states moving in interesting directions around assessment. We focus in on four areas in particular:

  1. Interim assessments for accountability
  2. Formative assessments to support instruction
  3. Shared item banks and new collaborations among states
  4. Social studies and science assessments

There are a few states starting to think outside the box on assessment, and a larger group making more subtle moves under the radar. But some states are at risk of backsliding on assessment quality because tests have become so politically toxic. We argue that investment in assessment is still important and valuable. States should work towards a well-rounded system of assessments (not just one test) that can support accountability, equity, and transparency, and also support teachers in real and useful ways.

What would most people miss about this project if they only read the headline? 

One, innovation in testing isn’t just about technology. There are some exciting examples that use technology to make tests faster, more accurate, or more engaging. But there are also examples where states are innovating away from technology and towards interactive or longer-term tasks created, administered, and graded by teachers.

Two, we’re not just talking about end of year reading and math tests. I was especially interested in exploring facets of state work on assessment that fall outside what federal law mandates. We highlight science, social studies, and formative assessment for instruction. But, you could also include things like early childhood and K-2 assessments, or assessments for English learners.

What compelled you to do this work? 

Many of the ideas we highlight in this brief get talked about at assessment conferences. But to someone involved in education policy who doesn’t specialize in assessment, especially policymakers, testing might just seem like a complicated, controversial chore. Why would you want to invest money and time in testing? I thought it was important to make the counterargument to that line of thinking, and delve into some ways that innovation and improvement are available and valuable for states right now.

What would a smart critic say about it, and how would you respond? 

If someone comes in dead set against testing of any kind, I doubt this paper will sway them, but I hope it provides some nuanced insight into what innovative tests can look like, and why it is worth improving tests, rather than eliminating them.

I could anticipate other critics saying that states shouldn’t expand their role in testing, should stick only to what is mandated, and leave everything else to local decision-makers. My response is that we’ve seen states do only the bare minimum, and what happens is basically a waste of time and money. Students and teachers still have to spend their time on tests, but they’re less useful and lower quality, and they don’t help anyone improve. It’s worthwhile to be more ambitious and innovative in order to make assessments a positive force in schools.

Other than this project, what are you most excited about right now?

In life, I’m excited for summertime adventures in the Finger Lakes (I’m based in Rochester, NY).

In education policy, I’m in the middle of a research project on local school performance frameworks that I’m very excited to share this fall. So, if anyone reading knows of work happening in their district to create or revise a school performance framework, they should send me an email!

–Guest post by Chad Aldeman