In the Nation's Interest

6 Takeaways from Scores on the New Common Core-Aligned Tests

Recently Massachusetts, known as a leader in K12 educational quality, made headlines by deciding to stop using one of the new, more rigorous tests aligned with the Common Core standards.  This is a real setback for improving K12 education, American students, and our future workforce.

Key:
Orange states: PARCC assessment
Green states: Smarter Balanced assessment
Grey states: Other assessment

Source: Education Week: http://www.edweek.org/ew/section/multimedia/map-common-core-2015-test-results.html#or

The Common Core Standards, Testing, and the Difference between Them

First, by way of background, the Common Core State Standards are a set of rigorous standards for what K12 students should know and be able to do to be college- and career-ready.  The National Governors Association and the Council of Chief State School Officers developed them beginning in 2009.  States could voluntarily adopt the Common Core, and ultimately 46 states and the District of Columbia did so.  Later, however, adoption of the standards became politicized – to put it mildly – even figuring in the presidential nominating races.  Three states have since withdrawn from the Common Core.

There are many reasons why the Common Core standards were and remain important, but topping the list are their rigor, compared to what came before, and the fact that they were envisioned as being common across most of the US.  Their rigor is important at a time when the quality of our country’s human capital – a.k.a., our labor force – is more essential than ever before.  The US labor force has consistently scored below average for developed countries when it comes to the skills.

And though most commentators don’t focus on it, the commonality of the Common Core standards was also central to their potential for generating reform.  By creating a shared framework for the general content and sequence of math and English instruction in grades K12, we could benefit from economies of scale in developing textbooks, training teachers, and so on.  Greater commonality and efficiency in these areas would have meant more resources to devote to making those textbooks and teachers better at helping our students achieve the outcomes we want for them.

And, after all, is the mathematics that a 5th grader needs to know in state A really that different from what they need to know in state B?

Yet, as important as it is to have rigorous common standards, common rigorous tests are even more important.  As they say, “What gets measured, gets done.”  To the extent that we adopt standards without actually measuring performance against those standards, the standards become largely aspirational – or to use another word, toothless.  The analogy would be to providing an employee with performance goals at the beginning of the year, but never reviewing their performance against those goals.

More important than the motivational benefits that shared tests would provide, such tests would also allow us to learn how to improve student performance.  By being able to compare the performance of schools to each other, even across state lines, we could begin to start learning from the huge amount of experimentation that occurs in different schools, districts, and states nationwide.  The reality is that schools, districts, and states are already trying a multiplicity of ways to improve student learning.  But given the lack of common metrics of student achievement, it’s very difficult to determine what’s effective, especially if schools are in different states.  We are like scientists who run a wide range of elaborate experiments, but never bother to measure or analyze their outcomes. (1)

Serious educational reformers understood the importance of having common state tests that were aligned to challenging standards.  And so two consortia of states were convened to develop new, “better” tests aligned to the more rigorous standards of the Common Core – “better” in the sense that the tests were meant to be more accurate assessments of students’ actual skills as opposed to their test-taking abilities.  The two consortia were PARCC and Smarter Balanced.  Unfortunately, from the point of view of subsequent politicization of the tests, the federal government (as in the Obama Administration) gave funding to the two consortia in 2010 to support development of the tests – prompting opposition from the opposite party.  (Note, however, that the federal government had no role in developing the content of the Common Core standards.)

Fast Forward to Today: Finally, the First Tests That Allow Us to Compare States

Developing high-quality standardized tests is both expensive and time-consuming,(2)  which is why five years after 2010 we are only now seeing the first set of test results from the PARCC and Smarter Balanced states.  Put another way, this is the first time we have data that will allow us to compare student performance, apples-to-apples, across states – and more importantly, across schools in different states, since it is really in schools, not states, where the rubber hits the road in terms of student instruction.

As the map at the top of this post shows, 18 states administered the Smarter Balanced tests to all their students during the 2014-15 school year (the green states on the map).  Eleven states and the District of Columbia administered the PARCC test (orange states on the map).  The states with the red checks are the states for which aggregated student test scores have now been reported.  That means, for the first time ever, we can now compare student performance either among the 12 Smarter Balanced states that have reported their data, or among the 6 PARCC states.  (Because the tests are different, one can only compare performance among students, schools, or states that used the same test.)  This is not as good as the vision of actually being able to compare the performance of states, schools, and students across the US, but it is still a good deal better than what we have had to date!

The real promise of these data lies in the chance to get very “granular” and analyze outcomes at the school level, where education actually happens.  This will allow analysts to start teasing out what distinguishes schools that are more (or less) effective with particular types of students.  For example, some schools are much more successful with disadvantaged students than others are.  What is it that those schools do differently?

Such inquiries will require fairly sophisticated statistical analyses and access to school-level data.  In the meantime, here are some quick take-aways from the 12 Smarter Balanced states that have reported their scores as of November 23, 2015.

Takeaway 1: Bravo to the 29 states that stuck with the new, common tests!

It takes some courage for governors and chief state school officers to stick with either the Smarter Balanced or PARCC tests given the political firestorm around them and the fact that they are more demanding than previous state tests (see Takeaway 2 below).  The 29 states (and District of Columbia) that did so deserve kudos for being courageous enough to face some uncomfortable truths straight in the eye – i.e., that student achievement levels are not as good as we pretend – in the interest of improving performance.

Takeaway 2: The new tests really are more demanding

Source: Education Week: http://www.edweek.org/ew/section/multimedia/map-common-core-2015-test-results.html#or

Note: In 2012-13 Idaho did not administer its state test to 11th graders.  The 10th grade scores are provided as the nearest comparable score to the 11th graders’ scores on the Smarter Balanced test in 2014-15.

In 2012-13, Idaho reported that almost 90% of its students were “proficient” in English language arts, based on the results of its own state test (the grey bars in the chart above).  Sounds good, until one sees that on the more rigorous Smarter Balanced test only about half of Idaho students (the red bars) scored “proficient”.  And I don’t intend to pick on Idaho here: this pattern is typical of most states, where proficiency rates on the Smarter Balanced or PARCC tests fell by a third to a half relative to the previous state-specific tests.

Takeaway 3: Roughly half of students score “proficient” in English

For the 12 states currently reporting scores on the 2014-15 Smarter Balanced tests, approximately half of students in grades 3-8 scored “proficient” in English language arts, with a range from 42% for California to 58% for Missouri.(3)   Whether one views this as the glass is half full – “Half of American students are proficient in English!” – or half empty – “Nearly half of American students lack proficiency in English language arts!” – lies in the eye of the beholder.

Takeaway 4: Over half lack proficiency in math

Proficiency rates in math are less positive than for English: the average proficiency rate in grades 3-8 was 41% across the 12 states reporting, with only a third (34%) of California’s 3rd-8th graders “proficient” at the low end.  Washington state was the only state where half (50%) of 3rd-8th graders scored “proficient”.  Unfortunately, the news is only worse for older students: averaged across the 12 states less than a third (30%) of 11th graders scored “proficient” in math.  (You can see sample questions from the 11th grade math test here.)

Takeaway 5: Proficiency stays on track for English as kids get older, but drops for math

The most disturbing back-of-the-envelope analysis I did with the Smarter Balanced data was to average proficiency rates by grade across the 12 states currently reporting data.  As the chart above shows, average proficiency in these 12 states was 50% for both English language arts and math in 3rd grade.  For English (the upper/blue line), proficiency rates hold relatively steady from grades 3 through 8, and then in 11th grade when the test is given again.  In math (the lower/red line), however, average proficiency rates drop below 40% by 5th grade, then hold relatively steady through middle school, but drop again to 30% for 11th graders when averaged across the 12 states.  Put another way, less than a third of 11th graders in these states are proficient in math.(4)   So much for preparing our students for the much-vaunted STEM (science, technology, engineering, and mathematics) jobs of the future.

Takeaway 6: Common tests allow more sophisticated analyses than these

The charts and takeaways that I’ve shared here are just quick back-of-the-envelope comparisons of average state scores for the 12 states reporting 2014-15 data from the Smarter Balanced tests.  As I alluded to above, the real power of these data to improve educational outcomes will occur when analysts drill down to school-level data to start teasing out patterns of what makes schools more effective in educating students.  (Using state-level data is kind of like using data about the hospital industry as a whole to try to figure out how to improve hospitals.)

I am confident that the availability of these apples-to-apples test scores will cause education experts to conduct those analyses.  The 12 states that are using the Smarter Balanced tests will be able to learn from each other in a way they couldn’t before, just as the states that stick with the PARCC test will be able to among their group.  So kudos to the governors and chief state school officers in those states for having the courage to assess their student achievement rigorously in the face of a political firefight rather than settling for feel-good proficiency scores.  We can’t improve if we’re not honest with ourselves about how we’re doing.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(1) Actually, to be more accurate, states do gather a large amount of data in the form of student test scores, but since most states insist on having their own state-specific test, there is no ability to compare schools across different states.  A more accurate version of my analogy would be to say that we run a large number of experiments in which every laboratory has defined its own idiosyncratic set of measures, thereby making it much harder for different labs to work together to increase the general body of knowledge about what works. 

(2) Which is another argument for why states should collaborate in using a shared test, as opposed to each state spending its own funds to develop its own test.

(3) However, Missouri used a “scaled-down” version of the Smarter Balanced test in some grades, so its scores may not be directly comparable.

(4) And actually it’s probably worse.  I was taking a simple average of the 12 states’ average scores, rather than weighting their scores based on state population.