Friday, June 20, 2008

Study Finds Little Benefit in New SAT

This from the New York Times:

The revamped SAT, expanded three years ago to include a writing test, predicts college success no better than the old test, and not quite as well as a student’s high school grades, according to studies released Tuesday by the College Board, which owns the test.

“The changes made to the SAT did not substantially change how predictive the test is of first-year college performance,” the studies said.

College Board officials presented their findings as “important and positive” confirmation of the test’s success.

“The SAT continues to be an excellent predictor of how students will perform,” said Laurence Bunin, senior vice president of operations at the board, and general manager of the SAT program. “The 3-hour, 45-minutes test is almost as good a predictor as four years of high school grades, and a better predictor for minority students.”

But critics of the new test say that if that is the best it can do, the extra time, expense and stress on students are not worth it.

“The new SAT was supposed to be significantly better and fairer than the old one, but it is neither,” said Robert Schaeffer, the public education director at FairTest, a group that is critical of much standardized testing. “It underpredicts college success for females and those whose best language is not English, and over all, it does not predict college success as well as high school grades, so why do we need the SAT, old or new?”

The reports, called validity studies, are based on individual data from 151,000 students at more than 100 colleges and universities who started college in fall of 2006.

Plans to revise the SAT were announced in 2002, the year after the University of California president, Richard Atkinson, threatened to drop the test as an admission requirement.

“Given the data released today, what was the point of all the hoopla about the SAT’s revisions beyond preserving their California market?” Mr. Schaeffer said. “This is all spin. It’s been a marketing operation from the get-go.” ...

5 comments:

Anonymous said...

I hope your readers and the CATS Task Force get the implications of this important revelation. The College Board was unable to develop a writing test that provided anything of value in predicting college performance.

That doesn't mean writing isn't important -- it certainly is -- but the SAT experience raises some very disturbing questions about whether its current writing assessment model, which is very similar to those in CATS, is effective in evaluating writing, or that the extreme amount of time taken to evaluate writing adds much accuracy to the final average test score.

Richard Day said...

Writing is important. It is also extremely difficult to measure.

The difficulty lies in trying to quantify a skill that is not well suited for quantification.

The problem for Kentucky is that not trying to measure writing (that is to say, leaving it out of the CATS altogether) would distract teachers and principals from teaching it. In a high-stakes environment, what gets measured gets the most attention.

That the College Board failed to do better than Kentucky in designing such a test is neither a condemnation of the college Board nor a significant concern for Kentucky. The two assessments have very different purposes.

The question for Kentucky is whether the writing test showed that our students are learning the writing curriculum and that writing is a priority.

If one is interested in predicting future performance in college, this study seems to suggest that high school grades would do just as well - and they are a lot cheaper than tha SAT.

Anonymous said...

The argument that anything not in the assessment will always be unimportant to schools is easy to challenge. Just try to remove football from most campuses in this state.

Is it helpful to include an assessment item that even you admit isn't accurate? Also, how helpful is the CATS writing assessment when teachers don't get any deep feedback from it?

Could we not have something like a serious audit process outside of CATS with some carrots (like statewide honors for really good papers) and sticks for poor performance? Properly done, this could give teachers as well as students lots of helpful information.

Richard Day said...

Sorry, but the football analogy doesn't work.

Better is to look at the arts. Where states have removed art & music from the assessment, many schools dropped classes that principals would have otherwise conceded were valuable.

The problem with assessing writing is not that it can't be evaluated at all - it's that the nature of the discipline defies easy quantification. Reliability suffers. It requires a qualitative approach that is rejected by those who only value quantification.

But that approach is not without value. It works very well in distinguishing between good writing and bad - but not between good writing and very good writing. The creativity embodied in the craft and the inter-rater reliability among evaluators are problems.

You would have the same problem evaluating poetry, or worse, visual art. Multiple choice questions about the nuts and bolts of the discipline can be measured - but not the art itself. Attmepts to quantify art are illadvised.

Any test that attempts to measure writing without the student actually writing is of little value.

The carrot and stick idea is useful for motivation, but doesn't solve the problem. Can we agree on what constitutes a really good paper?

Richard Day said...

Oops. I missed a piece of your comment.

You asked, "...how helpful is the CATS writing assessment when teachers don't get any deep feedback from it?"

It is only helpful in giving the legislature a moderately accurate assessment of how children are progressing in their writing skills. Principals get some useful long-term feedback from it.

I know it's got a larger margin of error than you'd want...and you just to allow for that and discount the accuracy of the results a little.

But it's still the best measure you've got and the SAT folks could not overcome the natural problematics associated with assessing the craft of writing any more than Kentucky could.

Imagine you've got a broken yardstick. For some reason the last two inches are cut off ...and you need to measure 3 feet of something. You're not going to eyeball it with that yardstick in your hand. Why not? Because you know it will get you close and you got a good idea of how big the error of measurement is.

And if you have a sense of the degree of error, you can at least see trends and make predictions - albeit with lower than desired reliability. As a principal I relied on CATS for predictions and it provided a pretty good yardstick for me over the years. I even created my own report card for principals that I used to assess our schols' progress. Skip Kifer helped me design it...somewhere around 1999...and we reviewed and plotted the data for several years.

The problem for the SAT is that the study undermines confidence in a test whose use in the US appears to be erroding.

The SAT study is not an indictment of CATS - a completely different test with a different fundamental design - in any way shape or form.