Risk assessment tools are flawed—should we throw them away?
This month, two research scientists and an attorney published an op-ed about risk assessment tools, which are presented as ways to reduce personal bias in the criminal legal system. Chelsea Barabas, Karthik Dinakar, and Colin Doyle argue: “When it comes to predicting violence, risk assessments offer more magical thinking than helpful forecasting.” The simple labels used by risk assessments, high or low risk, for example, “obscure the deep uncertainty of their actual predictions. Largely because pretrial violence is so rare, it is virtually impossible for any statistical model to identify people who are more likely than not to commit a violent crime.”
The authors note that a vast majority of even those deemed highest risk will not commit a violent crime while awaiting trial, so the tools, if they were accurate, should “simply predict that every person is unlikely to commit a violent crime while on pretrial release.” Instead, many risk assessments “sacrifice accuracy for the sake of making questionable distinctions among people who all have a low, indeterminate or incalculable likelihood of violence.” These tools scare judges about a risk for violence without providing them “any sense of the underlying likelihood or uncertainty of this prediction,” which could “easily lead judges to overestimate the risk of pretrial violence and detain far more people than is justified.”
In a statement signed by other experts, the op-ed authors argue that risk assessment tools that include violations such as missed payments in their definition of risk can actually increase pretrial detention. And they point out that using arrest and conviction histories means that people of color are disproportionately labeled as dangerous. These fundamental, technical problems “cannot be resolved,” they conclude. “We strongly recommend turning to other reforms.”
In response, three scholars wrote that we should think twice before throwing away risk assessment tools entirely. Psychology professor Sarah Desmarais, law professor Brandon Garrett, and computer science professor Cynthia Rudin write that the op-ed and statement contain inaccuracies. They note that most risk assessment tools do not rely on arrest records. And many disentangle risk for flight and danger to public safety. “While most validation studies measure pretrial criminal activity by looking at new arrests, this is not a problem inherent in the tools but rather in how the tools are being studied,” the write. “Instead of throwing out the tools, a reasonable solution would be to conduct research on their ability to predict other indicators of pretrial criminal activity.”
They also note that although risk assessments do factor in criminal history, that is the kind of information that judges weigh heavily in the absence of a risk assessment, so getting rid of the tool would not solve that problem.
“While there are technical challenges, it is extreme to claim that no remedy exists, and to insist that we make decisions without using data and statistics,” they conclude. “To call risk assessment fundamentally flawed suggests that we should abandon reforms and keep things the way they are. Instead, we need to give judges better information. No human being is an expert predictor. Relying on empirical data is far superior to going with one’s gut, if it is the right data, carefully analyzed, and presented in such a way as to minimize bias. In fact, statistical tools can be specially designed to help reduce the biases that are—obviously—inherent in the data.”
And this month a new study lent credence to the criticisms of risk assessments, while putting forward possible solutions. In 2016, ProPublica published a blockbuster article examining risk assessments in one Florida county, finding that Black defendants were almost twice as likely as white defendants to be “false positives,” labeled high risk when they did not go on to commit a crime. Meanwhile, white defendants who did go on to commit a crime, by contrast, were more likely than Black defendants to be labeled low risk.
“With the new study, the Center for Court Innovation wanted to determine if they would reach the same conclusions using a different tool in a different place,” writes Beth Schwartzapfel for The Marshall Project. They chose New York City, and a theoretical scenario, but their findings were almost exactly the same as ProPublica’s. “Among those who were not later arrested, almost a quarter of Black defendants were classified as high risk—which would have likely meant awaiting trial in jail—compared with 17 percent of Hispanic defendants, and just 10 percent of white defendants.”
“There’s no way to square the circle there, taking the bias out of the system by using data generated by a system shot through with racial bias,” Matt Watkins, senior writer at the Center for Court Innovation, and one of the authors of the paper, told Schwartzapfel. But it makes no sense to do away with these tools in a country where “business as usual, without the use of risk assessment, results in over-incarceration and racial bias in incarceration,” said Julian Adler, the Center for Court Innovation’s director of policy and research. His group encourages using the algorithm in context—as part of a larger decision-making framework that’s sensitive to issues of racial justice.
“In their study, the Center for Court Innovation researchers applied their risk assessment to various scenarios to see whether they could mitigate its racial bias and still cut back on rates of people sent to jail pretrial,” writes Schwartzapfel. “They found that if judges made decisions based primarily on the seriousness of the charges, then layered risk assessment on top of that, dramatically fewer people would go to jail, and the rate of racially disparate false positives would almost disappear.” In that scenario, anyone charged with a misdemeanor or nonviolent felony would automatically go home. Judges would only use risk assessment tools for the more serious cases. Researchers found this would cut pretrial detention by more than half and eliminate the racial bias in false-positives.
“That’s why the study is called ‘Beyond the Algorithm,’” Adler said. It’s about using “other tools at our disposal to create a suite of strategies to accomplish what we’re aiming at.”
|