Wednesday, March 09, 2011

Measuring the Gap under SB 1

The new SB1 test got a first reading before the Board of Education in February and is scheduled for a second reading in April. We are in a 60-day window set aside for public comment, so let's talk about it.

One of the issues that has worried me is, How should the achievement gap be measured?

This is important because we know that however the state calculates it, teachers will plan their strategies based on whatever focus is likely to provide the best test score results. One approach might induce teachers to focus on a small subset of the kids who are said to be “in the gap.” A different approach would broaden that focus.

KSN&C spoke to KDE testing guy Ken Draut, at the AdvanceEd Conference in December, and learned of KDE’s plans to use ACT benchmarks to measure the gap. Uh oh.

Generally, KDE is looking at what they call a balanced accountability approach.

Achievement score data would come from five days of testing with the new SB1 Test for grades 3-8 built around the new Kentucky Core Achievement Standards, as they are implemented. Scores will be calculated around “proficiency,” meaning that cut scores will be set to determined performance levels. Novice = 0; Apprentice = 0.5; Proficient = 1.0; and Distinguished = 1.5. The + .5 amount given to students scoring in the Distinguished range is thought of as a Bonus, and it will be offset by a negative .5 for each Novice student in the group.

But what about measuring the achievement gap?

Draut told conference attendees that “we really feel like, in the gap world, we got a really innovative model to measure gap.” Draut explained that KDE had three problems with measuring the gap which they have tried to address.

  1. The number of different sub groups, which can number to as many as 45 different goals.
  2. A lot of the kids fall into more than one group. For example, 80 percent of Kentucky’s African American kids are receiving free and reduced lunch. 80 percent of our ELL kids are in the free and reduced lunch group. 70 percent of our special education students receive free and reduced lunch. As a result, we end up counting one student multiple times. So under the present system, a single student might be counted four times. Miss one target and you are likely to miss four.
  3. Comparing “closed gap to group.” Draut said, “You want to close African American to White; American Indian to white… Because of the way testing works, you can end up with a “wavy” pattern.” For example, in JCPS one year, the white kids at Southern Middle School dropped backwards and the African American kids stayed the same, the Courier-Journal reported that Southern Middle was closing the gap. And the opposite can happen (which was our experience at Cassidy) where white kids can go up 8 points and African American kids go up 6 points, but the gap increases.

To address this, KDE plans to present all of the gap students’ data, but will create a new single group of underperforming “gap kids” who would only be counted one time. Instead of comparing the gap kids to the group, it would be measured against the goal of 100 percent proficiency, or what is called “gap to goal.” The gap is to be divided by the number of years schools are given to reach their goal and schools would be awarded points based on the percentage of that goal they were able to close. A school that had a six goals to meet and closed three of them would earn 50 percent of their points.

Growth is to be measured by using a regression of the reading and mathematics scores, the only tests given every year from grades 3-8. It will compare a student’s progress to other students who have been performing similarly. Given a proficient 5th grade student with a scale score of 230, who then earns a score of 240 in 6th grade: the model asks if this level of growth is typical of other Kentucky students, above average, or below average, and awards points accordingly.

KSN&C caught up after his presentation.

KSN&C: Ken, as you may be aware, a number of statisticians…like Skip Kifer, say that the modeling that underlie the statistics of the ACT Benchmarks are a bunch of crap, basically. [chuckles]

Draut: Right.

KSN&C: Are you concerned about that?

Draut: Well, this is how we answered the board the other day: It’s you guys. You guys drive this. If you say the ACT is a bunch of crap, lets’ throw it out…

KSN&C: Well, not the ACT. Just the benckmarks.

Draut: Well, I’m just saying, if you all say it, and then you put something else in, we’ll line right up, because we’re trying to get them ready for you.

KSN&C: OK, but you lost me. Tell me who “you” is. Because you’re saying the board…

Draut: Universities.

KSN&C: Oh.

Draut: You see, we’re driven by the universities. We can’t get our kids into the universities unless we meet your criteria.

KSN&C: So if the universities say, this standard isn’t appropriate, or the metric’s wrong, or something, then that’s going to be a problem for you guys.

Draut: Well, we’ll put in whatever you say, but I tell you, what the issue is, and we’ve said this to several people, tell us what you’d replace it with.

KSN&C: Uh huh.

Draut: Just tell us.

KSN&C: So, the benchmarks are useful, because they are there…But you have to know what they mean or they’re meaningless. And you can’t replace it with the ACT really, because that cuts out middle school and…causes you some other problems.

Draut: Right.

KSN&C: So, then what do I replace it with. I’ve got a bad yardstick, but it’s the best one I’ve got?

Draut: And what are the universities going to accept to get the kids in the door? Because whatever the universities accept, that’s what I’ve got to get my kids ready for.

KSN&C: Are you getting that kind of pushback from the universities?

Draut: No

KSN&C: So the question’s been raised but nobody’s pushing the issue?

Draut: No. It’s kinda like just what you said, tell me what’s in its place?

KSN&C: And nothing comes to mind.

Draut: So now you open up fifty years of research saying, hey, we can tell you it works. It does predict…

KSN&C: Do we know the degree to which those benchmarks are bad, or in what direction they are bad? Or is it that we just don’t know?

Draut: I think that you’d have to do some reading, both the pro and con, when I read, and I’ve heard Skip, but when I read the ___of it, it makes a lot of sense. And when I hear Skip it makes sense, too. I can’t get a sense of which one’s right…But that whole issue is driven by CPE and the universities because if you’re sitting there in the university saying we’re only going to take the kids that make the CPE benchmark, and we’re only going to take the COMPASS, then we say, OK, and we line up with you. But if universities change…and say, you know, we’re not going to use ACT, we’re going to use some new testing, then we’ll realign everything [to that]. ..But I think it would be useful to look at both the pro and the con.

For a few months now, I've been pondering Draut's position that decisions made at CPE should drive the model ultimately adopted by the Kentucky Board of Education. Generally I agree that we can not lower standards and KDE must hit college-ready targets. But I'm much less convinced that CPE ought to dictate how the achievement gap in measured in our elementary and middle schools.

NOTE: It is my understanding that the EXPLORE can predict results on the PLAN test, but not the ACT. The PLAN test can predict performance on the ACT but not performance in college. The ACT can predict performance in college up to a point, and its arguably not the best way, but is made better by the inclusion of other measures.

No comments: