The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Stealth Quotas
The Dangerous Cure for "AI bias"
You probably haven't given much thought recently to the wisdom of racial and gender quotas that allocate jobs and other benefits to racial and gender groups based on their proportion of the population. That debate is pretty much over. Google tells us that discussion of racial quotas peaked in 1980 and has been declining ever since. While still popular with some on the left, they have been largely rejected by the country as a whole. Most recently, in 2019 and 2020, deep blue California voted to keep in place a ban on race and gender preferences. So did equally left-leaning Washington state.
So you might be surprised to hear that quotas are likely to show up everywhere in the next ten years, thanks to a growing enthusiasm for regulating technology – and a large contingent of Republican legislators. That, at least, is the conclusion I've drawn from watching the movement to find and eradicate what's variously described as algorithmic discrimination or AI bias.
Claims that machine learning algorithms disadvantage women and minorities are commonplace today. So much so that even centrist policymakers agree on the need to remedy that bias. It turns out, though, that the debate over algorithmic bias has been framed so that the only possible remedy is widespread imposition of quotas on algorithms and the job and benefit decisions they make.
To see this phenomenon in action, look no further than two very recent efforts to address AI bias. The first is contained in a privacy bill, the American Data Privacy and Protection Act (ADPPA). The ADPPA was embraced almost unanimously by Republicans as well as Democrats on the House energy and commerce committee; it has stalled a bit, but still stands the best chance of enactment of any privacy bill in a decade (its supporters hope to push it through in a lame-duck session). The second is part of the AI Bill of Rights released last week by the Biden White House.
Dubious claims of algorithmic bias are everywhere
I got interested in this issue when I began studying claims that algorithmic face recognition was rife with race and gender bias. That narrative has been pushed so relentlessly by academics and journalists that most people assume it must be true. In fact, I found, claims of algorithmic bias are largely outdated, false, or incomplete. They've nonetheless been sold relentlessly to the public. Tainted by charges of racism and sexism, the technology has been slow to deploy, at a cost to Americans of massive inconvenience, weaker security, and billions in wasted tax money – not to mention driving our biggest tech companies from the field and largely ceding it to Chinese and Russian competitors.
The attack on algorithmic bias in general may have even worse consequences. That's because, unlike other antidiscrimination measures, efforts to root out algorithmic bias lead almost inevitably to quotas, as I'll try to show in this article.
Race and gender quotas are at best controversial in this country. Most Americans recognize that there are large demographic disparities in our society, and they are willing to believe that discrimination has played a role in causing the differences. But addressing disparities with group remedies like quotas runs counter to a deep-seated belief that people are, and should be, judged as individuals. Put another way, given a choice between fairness to individuals and fairness on a group basis, Americans choose individual fairness. They condemn racism precisely for its refusal to treat people as individuals, and they resist remedies grounded in race or gender for the same reason.
The campaign against algorithmic bias seeks to overturn this consensus – and to do so largely by stealth. The ADPPA that so many Republicans embraced is a particularly instructive example. It begins modestly enough, echoing the common view that artificial intelligence algorithms need to be regulated. It requires an impact assessment to identify potential harms and a detailed description of how those harms have been mitigated. Chief among the harms to be mitigated is race and gender bias.
So far, so typical. Requiring remediation of algorithmic bias is a nearly universal feature of proposals to regulate algorithms. The White House blueprint for an artificial intelligence bill of rights, for example, declares, "You should not face discrimination by algorithms and systems should be used and designed in an equitable way."
All roads lead to quotas
The problems begin when the supporters of these measures explain what they mean by discrimination. In the end, it always boils down to "differential" treatment of women and minorities. The White House defines discrimination as "unjustified different treatment or impacts disfavoring people based on their "race, color, ethnicity, [and] sex" among other characteristics. While the White House phrasing suggests that differential impacts on protected groups might sometimes be justified, no such justification is in fact allowed in its framework. Any disparities that could cause meaningful harm to a protected group, the document insists, "should be mitigated."
The ADPPA is even more blunt. It requires that, among the harms to be mitigated is any "disparate impact" an algorithm may have on a protected class – meaning any outcome where benefits don't flow to a protected class in proportion to its numbers in society. Put another way, first you calculate the number of jobs or benefits you think is fair to each group, and any algorithm that doesn't produce that number has a "disparate impact."
Neither the White House nor the ADPPA distinguish between correcting disparities caused directly by intentional and recent discrimination and disparities resulting from a mix of history and individual choices. Neither asks whether eliminating a particular disparity will work an injustice on individuals who did nothing to cause the disparity. The harm is simply the disparity, more or less by definition.
Defined that way, the harm can only be cured in one way. The disparity must be eliminated. For reasons I'll discuss in more detail shortly, it turns out that the disparity can only be eliminated by imposing quotas on the algorithm's outputs.
The sweep of this new quota mandate is breathtaking. The White House bill of rights would force the elimination of disparities "whenever automated systems can meaningfully impact the public's rights, opportunities, or access to critical needs" – i.e., everywhere it matters. The ADPPA in turn expressly mandates the elimination of disparate impacts in "housing, education, employment, healthcare, insurance, or credit opportunities."
And quotas will be imposed on behalf of a host of interest groups. The bill demands an end to disparities based on "race, color, religion, national origin, sex, or disability." The White House list is far longer; it would lead to quotas based on "race, color, ethnicity, sex (including pregnancy, childbirth, and related medical conditions, gender identity, intersex status, and sexual orientation), religion, age, national origin, disability, veteran status, genetic information, or any other classification protected by law."
Blame the machine and send it to reeducation camp
By now, you might be wondering why so many Republicans embraced this bill. The best explanation was probably offered years ago by Sen. Alan Simpson (R-WY): "We have two political parties in this country, the Stupid Party and the Evil Party. I belong to the Stupid Party." That would explain why GOP committee members didn't read this section of the bill, or didn't understand what they read.
To be fair, it helps to have a grasp of the peculiarities of machine learning algorithms. First, they are often uncannily accurate. In essence, machine learning exposes a neural network computer to massive amounts of data and then tells it what conclusion should be drawn from the data. If we want it to recognize tumors from a chest x-ray, we show it millions of x-rays, some with lots of tumors, some with barely detectable tumors, and some with no cancer at all. We tell the machine which x-rays belong to people who were diagnosed with lung cancer within six months. Gradually the machine begins to find not just the tumors that specialists find but subtle patterns, invisible to humans, that it has learned to associate with a future diagnosis of cancer. This oversimplified example illustrates how machines can learn to predict outcomes (such as which drugs are most likely to cure a disease, which websites best satisfy a given search term, and which borrowers are most likely to default) far better and more efficiently than humans.
Second, the machines that do this are famously unable to explain how they achieve such remarkable accuracy. This is frustrating and counterintuitive for those of us who work with the technology. But it remains the view of most experts I've consulted that the reasons for the algorithm's success cannot really be explained or understood; the machine can't tell us what subtle clues allow it to predict tumors from an apparently clear x-ray. We can only judge it by its outcomes.
Still, those outcomes are often much better than any human can match, which is great, until they tell us things we don't want to hear, especially about racial and gender disparities in our society. I've tried to figure out why the claims of algorithmic bias have such power, and I suspect it's because machine learning seems to show a kind of eerie sentience.
It's almost human. If we met a human whose decisions consistently treated minorities or women worse than others, we'd expect him to explain himself. If he couldn't, we'd condemn him as a racist or a sexist and demand that he change his ways.
To view the algorithm that way, of course, is just anthropomorphism, or maybe misanthropomorphism. But this tendency shapes the public debate; academic and journalistic studies have no trouble condemning algorithms as racist or sexist simply because their output shows disparate outcomes for different groups. By that reductionist measure, of course, every algorithm that reflects the many demographic disparities in the real world is biased and must be remedied.
And just like that, curing AI bias means ignoring all the social and historical complexities and all the individual choices that have produced real-life disparities. When those disparities show up in the output of an algorithm, they must be swept away.
Not surprisingly, machine learning experts have found ways to do exactly that. Unfortunately, for the reasons already given, they can't unpack the algorithm and separate the illegitimate from the legitimate factors that go into its decisionmaking.
All they can do is send the machine to reeducation camp. They teach their algorithms to avoid disparate outcomes, either by training the algorithm on fictional data that portrays a "fair" world in which men and women all earn the same income and all neighborhoods have the same crime rate, or simply by penalizing the machine when it produces results that are accurate but lack the "right" demographics. Reared on race and gender quotas, the machine learns to reproduce them.
All this reeducating has a cost. The quotafied output is less accurate, perhaps much less accurate, than that of the original "biased" algorithm, though it will likely be the most accurate results that can be produced consistent with the racial and gender constraints. To take one example, an Ivy League school that wanted to select a class for academic success could feed ten years' worth of college applications into the machine along with the grade point averages the applicants eventually achieved after they were admitted. The resulting algorithm would be very accurate at picking the students most likely to succeed academically. Real life also suggests that it would pick a disproportionately large number of Asian students and a disproportionately small number of other minorities.
The White House and the authors of the ADPPA would then demand that the designer reeducate the machine until it recommended fewer Asian students and more minority students. That change would have costs. The new student body would not be as academically successful as the earlier group, but thanks to the magic of machine learning, it would still accurately identify the highest achieving students within each demographic group. It would be the most scientific of quota systems.
That compromise in accuracy might well be a price the school is happy to pay. But the same cannot be said for the individuals who find themselves passed over solely because of their race. Reeducating the algorithm cannot satisfy the demands of individual fairness and group fairness at the same time.
How machine learning enables stealth quotas
But it can hide the unfairness. When algorithms are developed, all the machine learning, including the imposition of quotas, happens "upstream" from the institution that will eventually rely on it. The algorithm is educated and reeducated well before it is sold or deployed. So the scale and impact of the quotas it's been taught to impose will often be hidden from the user, who sees only the welcome "bias-free" outcomes and can't tell whether (or how much) the algorithm is sacrificing accuracy or individual fairness to achieve demographic parity.
In fact, for many corporate and government users, that's a feature, not a bug. Most large institutions support group over individual fairness; they are less interested in having the very best work force -- or freshman class, or vaccine allocation system -- than they are in avoiding discrimination charges. For these institutions, the fact that machine learning algorithms cannot explain themselves is a godsend. They get outcomes that avoid controversy, and they don't have to answer hard questions about how much individual fairness has been sacrificed. Even better, the individuals who are disadvantaged won't know either; all they will only know is that "the computer" found them wanting.
If it were otherwise, of course, those who got the short end of the stick might sue, arguing that it's illegal to deprive them of benefits based on their race or gender. To head off that prospect, the ADPPA bluntly denies them any right to complain. The bill expressly states that, while algorithmic discrimination is unlawful in most cases, it's perfectly legal if it's done "to prevent or mitigate unlawful discrimination" or for the purpose of "diversifying an applicant, participant, or customer pool." There is of course no preference that can't be justified using those two tools. They effectively immunize algorithmic quotas, and the big institutions that deploy them, from charges of discrimination.
If anything like that provision becomes law, "group fairness" quotas will spread across much of American society. Remember that the bill expressly mandates the elimination of disparate impacts in "housing, education, employment, healthcare, insurance, or credit opportunities." So if the Supreme Court this term rules that colleges may not use admissions standards that discriminate against Asians, in a world where the ADPPA is law, all the schools will have to do is switch to an appropriately reeducated admissions algorithm. Once laundered through an algorithm, racial preferences that otherwise break the law would be virtually immune from attack.
Even without a law, demanding that machine learning algorithms meet demographic quotas will have a massive impact. Machine learning algorithms are getting cheaper and better all the time. They are being used to speed many bureaucratic processes that allocate benefits, from handing out food stamps and setting vaccine priorities to deciding who gets a home mortgage, a donated kidney, or admission to college. As shown by the White House AI Bill of Rights, it is now conventional wisdom that algorithmic bias is everywhere and that designers and users have an obligation to stamp it out. Any algorithm that doesn't produce demographically balanced results is going to be challenged as biased, so for companies that offer algorithms the course of least resistance is to build the quotas in. Buyers of those algorithms will ask about bias and express relief when told that the algorithm has no disparate impact on protected groups. No one will give much thought (or even, if the ADPPA passes, a day in court) to individuals who lose a mortgage, a kidney, or a place at Harvard in the name of group justice.
That's just not right. If we're going to impose quotas so widely, we ought to make that choice consciously. Their stealthy spread is bad news for democracy, and probably for fairness.
But it's good news for the cultural and academic left, and for businesses who will do anything to get out of the legal crossfire over race and gender justice. Now that I think about it, maybe that explains why the House GOP fell so thoroughly into line on the ADPPA. Because nothing is more tempting to a Republican legislator than a profoundly stupid bill that has the support of the entire Fortune 500.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
I love the argument that facial recognition is transphobic because the algorithms haven't figured out how to ask the person their pronouns.
AI doesn't care. Apparently, neither does the Selective Service.
If you want the cranky old white male perspective, the Volokh Conspiracy is the place!
Nyehhh, this one just feels lazy.
Just the facts, clingers.
Rev. Arthur L. Kirkland
There’s nothing to be discussed about the asinine sledge hammer of racism accusations in mathematical analysis of data? Just jump straight to social arm twisting by Terminator T-1000?
That was how they determined social problems to fix in the first place decades ago, before computers even.
Rev. Arthur L. Kirkland
To show how well-grounded this diatribe is, Baker leads off by telling us by saying that the reason American companies have shied away from facial recognition is because of issues with bias and completely fails to mention the privacy issues that actually drive most of the conversation on the topic. Not sure if he's wrong because he's ignorant or intentionally being extremely misleading, but it sets a tone for how seriously to take the rest of the piece.
What you think drives "most of the conversation on the topic" doesn't matter. What activists are actually trying to change does.
Please find some example of a US tech company deciding not to use facial recognition because of bias issues. On the other hand, Facebook specifically called out privacy in its decision to stop using facial recognition, and all of the US legislation I've seen on the topic is privacy-related as well.
I guess if there's folks like you that will just credulously accept whatever nonsense Baker throws out there, it's not surprising he continues to do so.
Again, you just try to shift the burden of the analysis rather than facing it honestly.
Baker's a former counsel for the NSA. He doesn't care about privacy. For years now, his attitude whenever computer privacy or encryption is mentioned has basically been, "Ah, privacy, shmivacy, law enforcement and intelligence agencies need to get the bad guys. Nothing else matters."
What article are you reading? Looking above, I can't even find a mention by Baker of facial recognition.
Right here in the article:
"I got interested in this issue when I began studying claims that algorithmic face recognition was rife with race and gender bias. That narrative has been pushed so relentlessly by academics and journalists that most people assume it must be true. In fact, I found, claims of algorithmic bias are largely outdated, false, or incomplete."
There's a link to a longer article with his findings. Reading it I got the impression that there were a still a lot more problems than he implies here.
jb,
Rather than address what is incorrect with Baker's analysis , you brand it as a diatribe and then bridge to a completely different topic, that may in fact be unrelated to the matter discussed by Baker.
You then charge him with dishonesty, ill will or both.
I'd say you short outburst has just those characteristics.
By the way, how much have you studied AI in a serious way? 1 hour, 1 day, 1 month or ≥1 year
Baker brought facial recognition up in the context of bias, not me. (Apparently this is what got him interested in the topic of AI bias in the first place!) It seems reasonable to point out that one of the points in his piece is incorrect and dishonest even if it is not the central thesis of the rest of the piece.
(I'll address the bias/quotas topic below, where there's plenty of good discussion. We don't have to talk about all the things all the time in all the places.)
Don't forget, Baker absolutely hates the very concept of privacy with a burning passion. He has spent years here ranting about any policy designed to promote the concept.
"Unfortunately, for the reasons already given, they can't unpack the algorithm and separate the illegitimate from the legitimate factors that go into its decisionmaking."
You're being too generous. The important thing to remember is that there generally isn't any reason to think there are any illegitimate factors. The computer relentlessly determines what is relevant, and discards what isn't. This process tends to eliminate, not accentuate, genuinely illegitimate factors.
The only reason to suppose they exist is ideology, most of the time.
"The only reason to suppose they exist is ideology, most of the time."
That is a good description
Remember those sinister AIs in movies and TV who could be outwitted by a lone human hero armed with plain common sense?
Escapist fantasy is fun.
Well I remember a few occasions where Captain Kirk used logic and reason to talk a superintelligent machine into destroying itself.
I think The Prisoner and Dr. Who also used this plot device.
In case you're not aware the Pluto TV free app now has a 24/7 old original Dr. Who channel.
Interesting, thank you.
And I forgot that Dr. Who (SPOILER ALERT) doesn’t outwit the computer in The Green Death, he just gets a guy to blow it up.
Exterminate! Exterminate!
That's....a rough one.
One of the issues with these algorithms is that they are correlation based. And depending what goes into the data, what comes out can be a self fulfilling prophecy. And they're based on historical data. So, the machine draws a correlation. African Americans are less likely to do well in college, according to the historical data. So, it spits out that result as to who will do well and won't do well in college in the future.
At the same time "fixing" the machines by introducing artificial quotas isn't appropriate either.
Generally, I agree. Algorithms that merely replicate existing discrepancies can perpetuate them. But I'm not as worried about them because algorithms that merely replicate existing discrepancies also aren't all that interesting or useful. They don't tell you anything new which means they don't lead to a competitive advantage (by which I mean profit) for the person following the algorithms advice.
Okay, there's still Baker's cynical (but also somewhat true) comment about algorithms insulating companies from accusations of discrimination, but that's not enough to justify the expense of the algorithm developers. They want new insights so they can make more money, not merely perpetuate the same practices at existing profit levels.
That's true, the developers want that the best algorithm. The managers who set their compensation may just want the best algorithm that won't get them sued. Or to be a bet fairer, they're seeking to maximize revenue after accounting for legal expenses.
A good algorithm could be better than a lesser algorithm for profits, even after having a quota overlaid over it, because it will allow you to implement the quota at the highest level of profit.
The thing to worry about here is that AI with wokeness built in could eventually be everywhere, if this isn't stopped. It will be built into the training sets, the general databases that researchers use, everything. It could become inescapable.
Your self-driving car could end up monitoring the race of the people in nearby cars, just to make sure the accident statistics aren't 'discriminatory', instead of trying to keep them as low as possible.
"that’s not enough to justify the expense of the algorithm developers. They want new insights so they can make more money, not merely perpetuate the same practices at existing profit levels."
Talk about cynicism. You spout is at full volume
No, that's a reasonable take: There's no point in generating an AI system that merely perpetuates the status quo, because you're already AT the status quo, and developing the AI costs money. If developing it just gets you where you are already, that's wasted money.
Developing the AI has to be expected to reduce costs, or increase revenue, and by more than the cost of development, or doing the work is senseless.
It's accurate studies that discovered problems, so they could begin to be addressed.
Sorry but I have to disagree with this comment rather strongly. Those studies never discover the problem. They can't. You have to have some kind of external evidence to inspire the desire for the study in the first place. The studies merely quantify the problem (to varying degrees of accuracy and success).
Yes, quantification can help to address the problem - sometimes. Other times, the attempt to quantify becomes itself a part of the problem.
AI is generally not meant to do root cause analysis, but to produce quantitatively better outcomes.
It can do so in dramatic fashion, for example, by tripling user hours at complicated technical facilities.
Are there really artificial "quotas"? As far as I can tell, Baker doesn't link to any examples of them. Yes, people working in AI are using various techniques to avoid bias, but none of them rely on any sort of quota.
The concern described in the article above is not about analysts using techniques to avoid bias but about the inevitable consequence of companies trying to prove the non-existence of bias to regulators and litigants. The only acceptable proof is going to be a de facto return to quotas.
Oh, I'm glad you were able to figure out what he's trying to say.
If this is the argument, it could be made just as easily against basically any anti-discrimination rule, AI or not. Meanwhile, the techniques Baker is complaining about would presumably the sort of thing you'd show to demonstrate that you were accounting for bias and modeling to avoid it.
It wasn't that complicated, I understood this to be his point, too.
In 45 years of programming, the phrase I muttered the most was "I hate computers"
For 30 years my most common epithet has been "F**K MICROSOFT!!!!"
When I was still programming, mostly it was, “Why the hell did it do THAT?”
I literally wrote a graphics program in college that didn't just crash the computer, it stayed crashed through a reboot if you didn't remember to power down the monitor...
What about CLEAR?
Progress is all cool and stuff until it isn't, apparently. There is absolutely no reason why a computer program incapable of human biases might do what it does....other than human biases which are obviously the reason behind it.
In some of these cases the computer is supposedly discriminating on a basis it wasn't even given data on. You give the computer a database without any race entries, you can be pretty darned certain it isn't going to be discriminating on the basis of race.
If the database contains data closely correlated with race, then the computer can discriminate based on race without having overt racial input. (It doesn't have to be anything pernicious; zip code of someone's residence will do an awful lot of work for you.)
That should be obvious, from the experience of schools trying to get around affirmative action bans like Prop 209 by using proxies for race.
The difference is that the schools or realtors or whatever HAD the race data to begin with. Having it, they could easily determine innocent looking proxies for race. Caring about race, they might be motivated to generate such proxies.
The computer doesn't have the race data unless given it, and thus has no way to generate proxies for race. It it ends up using zip code as a predictive factor, it's not because it was doing so to obscure racial discrimination, it's because zip code was, itself, a good predictive factor.
IOW, the computer may create models that have disparate impact, but if denied racial data, it can never create models that are racially discriminatory. Such disparate impact is always going to be innocent disparate impact.
Remember, innocent disparate impact is a thing. It's basically inevitable, even, if you don't discriminate on the basis of race, because the races aren't similarly situated.
That's the issue here: If you demand the computer model exhibit no disparate impact, you're inevitably forced to program the computer to discriminate on the basis of race. Because not discriminating has disparate impact in the real world!
If the variable you're trying to predict is highly correlated with race then the model will figure out the race. The zip code isn't a predictive factor because of the local geography, it's a predictive factor because it predicts race.
Say you're trying to predict if a borrower will default on their mortgage, and being a member of race 'X' is a big predictor of whether that happens. A lender, all else being equal, might rationally deny the mortgage based on that race, problem is that's illegal.
So you give the model a bunch of factors that correlate with race, zip code, job type, education level, etc. Without understanding the existence of race the model will start making predictions based on the borrower's race, again denying the mortgage based on the race, which is again illegal.
The important thing to understand is that the model doesn't actually have a concept of race. Even if you give it a spreadsheet with a column labeled race it's just meaningless numbers to the computer. So the distinction of giving it race directly (a column labeled race) vs indirectly (zip code) is meaningless.
"If the variable you’re trying to predict is highly correlated with race then the model will figure out the race."
No, no, no, NO. The computer has no motive to "figure out race" unless it was programed to do it. It has no ability to "figure out race", because it has no racial data to begin with. If the variable you're trying to predict is highly correlated with race, your prediction will be correlated with race, but it can't be based on race, it will be based on zip code, income, whatever data you DID give it.
A model can't be "based on" data you deny it, unless you're redefining "based on" to mean nothing more than accidental correlation.
The computer has no concept of race even if you give it race, it's all just numbers.
If you give it zipcode, and the major determinate between zipcodes is the race of the occupants, then you're still giving it race.
As long as race is inferable from the data and race is a major determinate of the outcome, then your model is going to end up working on race.
I actually figured out a way where you might be able to make a fairly race-neutral data.'
First, take the data and try to include race. If the model fails then there's not enough data to predict race. Now you can probably get a race-neutral predictor of your chosen outcome.
"If you give it zipcode, and the major determinate between zipcodes is the race of the occupants, then you’re still giving it race."
No, you're not. You're giving it zipcode. You can't infer race from non-racial data if you're a computer, because the data you're given is literally all the data you have. It's not like a human, who could recognize that a zip code is in Detroit or Flint Michigan, and infer from that that the people living there were black; The computer has no such background information! It doesn't KNOW that most people in Detroit are black!
You're trying to define any correlation with race as being based on race. But that's not honest. Rather, it's exactly the problem the OP is about: A flat refusal to admit that correlation with race isn't automatically due to racism, leading to a demand to ACTUALLY base results on race to negate a correlation that's really reflecting the data.
So I think there's a couple things going on here.
First, you're dancing around an old argument about the definition of racism.
Consider the two different scenarios:
a) "I'm denying you a mortgage because even though your other qualifications are solid you're black and I don't want to give mortgages to black people."
b) "I'm denying you a mortgage because even though your other qualifications are solid you're black and my experience is that black people are more likely to default."
Everyone agrees "a" is racist, but "b" is where people disagree, I think "b" is also racist, but it brings up some problematic scenarios.
Either way, this entire argument assumes "b" is racist, because if it wasn't then you could just give race to the model and as long as you didn't literally weigh race in the objective function the model couldn't be racist.
As for the second issue, this insistence that if you don't give the model a column labeled "race" then it can't know race... I can keep running in circles but I think it's clear that you haven't done much data science (building those kind of models) because to anyone that has it's obviously false. The computer doesn't understand "race" and more than it understands "zip code", it's all just numbers. The "race" column has a strong correlation to real-world race, and "zip code" column has a weaker correlation.
Think of it this way.
You never give the model race, or zip code, or any other data with a meaning we would understand it.
You give the model patterns, if one of your inputs is "race" then those patterns contain a lot of information about what we understand as "race", if an input is "zip code" then is still has a decent amount of race info, especially if you also have things like job title.
To put it another way, one of the big strengths of ML is uncovering hidden relationships in the data, so even when you try to hide race the model can still uncover that relationship. Give me a dataset with a bunch of columns like race, job title, income, and literacy level about whether someone would have been allowed to drink at a "whites only" fountain in the 1950s south. Now drop the "race" column and I bet I still nail the prediction because the model has clearly uncovered the race relationship.
“Without understanding the existence of race the model will start making predictions based on the borrower’s race”
You've got it wrong. The AI won't make predictions based on race because it does not know race. It will make predictions based on age, zip, income, work history, etc.
You, as an outside observer, might see a racial aspect because you know the racial makeup of the group but, to the AI, race simply does not exist.
It's an ML model, nothing exists but numbers. You as an outside observer assumes you gave the model race because you have a column called "race" but the model just sees another set of numbers.
Tell me you have no idea how AI models are trained without telling me you have no idea how how AI models are trained.
Let me guess, the training involves sitting David Duke down with the computer and having him program it?
Nope, but the training data comes from humans so it generally comes with similar biases to them. This can be a feature or a bug depending on the application, but it’s inherent to the process.
So it is you who have never programmed anything to do with computers, go it. Thanks for the clarification.
LOL
Dude, go take an Intro to Machine Learning class so you can stop demonstrating what a clown you are to anyone who ever created a model.
Grutter was a well established quota system as per the district court finding of fact.
But cA 6 de novo suddenly found critical mass diversity
Amazing what. Courts can find with blinders
Not quite: machine learning algorithms are only "uncannily accurate" on the training data, which they overfit.*
Machine learning algorithms, like anything else: garbage in - garbage out. And as with any forecasting tool: the future is not like the past.
* with a linear regression on 100 data points, you can get 100% fit using 100 variables. That does not mean all 100 are relevant.
The usual answer to that these days is absolutely huge amounts of training data, to make sure you have a LOT more data points than variables.
Getting real training data is the tricky part, especially with models that have billions of parameters. Mostly people use tricks to reuse training data.
history is finite.
plus, does the pre covid data really apply to today?
It’s a significant problem that machine learning algorithms can’t perform sensitivities to determmine what variables are driving the results. In fact, there arent even concepts of “variables”. hypothetically, if i were in a position to approve models for a bank, i wouldnt. i couldnt. SR 11-7 requires model sensitivities.
In some respects the author of this post is correct. If you dont know whats driving the results, you can't determine liability. or can you? i am undecided on this because i am a frim beliver in the the law of unintended consequences
Yes, it IS a significant problem, you really do want to know why the model is arriving at its conclusions. Maybe just so that you can stop spending money collecting data it doesn't care about, maybe because you need to be able to show some judge that, NO, the model isn't deducing the race of people from their names in order to racially discriminate. But you need to know.
Yes, that is the usual assumption. It frequently fails because those "absolutely huge" amounts of data are not truly independent, meaning that your real underlying training data set is a lot smaller than it appears.
The second common point of failure is leave the variables unpruned while ignoring the necessary corrections to the statistical significance test. XKCD has the best explanation of this problem I've ever found. Unfortunately, they stop short of the fix. Look up the "Bonferroni correction" for one statistical take on how to fix the multiple-hypothesis problem.
They [The American people] condemn racism precisely for its refusal to treat people as individuals, and they resist remedies grounded in race or gender for the same reason.
The part before the comma is true. The part after the comma is right-wing farce. American right wingers are a group which encompasses most of the remaining anti-black racists in America—Alabama Senator Tuberville and his intended audience, for instance. They deliver a notable and vociferous fraction of right wing advocacy. Those Americans condemn remedies grounded in race because such remedies promise to accomplish anti-racist goals. Committed racists are determined to resist anti-racist goals, and unreflective racists find them distasteful.
Both groups understand that alternative remedies grounded in assessment of individuals require literally impossible person-by-person evaluations of the contents of the human heart, and are thus doomed to failure. Failure for anti-racism is the result both groups prefer.
More generally, the notion that group-based remedies for racism are fading, and doomed to disappear, is mostly an alternative expression of right wing preference. As right wingers have got better organized, and more judicially influential, they have done what they can to gut group remedies as a means to evaluate and ameliorate effects of racism.
They have had some success. But not among the people most damaged by residual racism. Those don’t think now, any less than they did before, that group remedies should be abandoned. And they will never think that until remedies for racism are no long needed. Until then, they will never stop advocating for them, either. And they cannot be made to shut up, except by the most virulent kind of racist oppression.
So the right wing dream to get past group remedies and group assessments—and make everyone focus instead on futile assessments of individual moral character—can never be accomplished. So long as racism—never forget, a systematic manifestation, not an individual one; that “ism” on the end of the word literally connotes systematic action—delivers systematically predictable results which disadvantage its targets, there will be no substitute for group-based assessments and remedies for those results. Nothing else can work. The targets will not cease to protest, to organize, and to demand remedies which can work.
Indeed some racist clowns will demand reparations forever, but we will tire of that and learn to unconcernedly ignore them.
Which group remedies were popular and successful (in the US) in the past?
All of them to some degree.
Unless you're arguing that they should have 100% success (which is a dumb argument).
Anyone who writes ~”all group remedies have been successful” isn’t qualified to identify dumb arguments.
As I said, we will stop tolerating this racist "anti-racist" nonsense when the larger battle is won. Adler worries on about a nothingburger, which is the inability to give a short description of why AI might match its results to and instantiate racist quotas. But "wholistic admissions" is equally impenetrable, yet that is no barrier to demonstrating that it produces exactly that result.
Even in California, the voters rejected such upside-down, Orwellian logic (as the post itself points out).
Lord knows, it’s not because advocates of racial preferences were censored. The voters were bombarded with racist propaganda and rejected it.
The secret ballot probably helped, so that voters could express their real views without fear of being cancelled.
Even on the left, people don't like affirmative action in government hiring. I suspect not too many tears will be shed if SCOTUS eliminates in public education as well, as long as private education isn't swept up in it. People do like lite affirmative action in contracting, along the lines of "buy American... all else being equal." But no one thinks that's moving any needles, so it's not a priority.
That doesn't mean people on the left reject all group-based remedies. The Voting Rights Act, for example, is very popular.
Even on the left, people don’t like affirmative action in government hiring.
I don't want to be too pointed about this, but that is a notably restricted use of, "people." Pretty sure folks in Prince Georges County, MD generally disagree with you.
Also, you seem to suppose majoritarian rejection of affirmative action is inherently legitimate. Can you see any problem with thinking that way?
It's significant that a majority of Californians, when free to express their opinion in the privacy of the voting booth, opposed racial preferences (let's call them what they are). The fact that it's a majority isn't significant in itself, but the quality of the pro-preference arguments are indicated by the fact that these arguments didn't fly even with a remarkably woke electorate.
You're conflating two different problems.
The first problem is models that are trained on biased data. This is what caused the initial meme over biased AI. Imagine Google training its facial recognition on all of its white male employees. It's just not going to work well on other demographics. I think that kind of thing has mostly been fixed, but it's still worth being vigilant about.
The second problem is AI telling you things you don't want to hear. This is just a policy problem. If you have a policy you want to do, and you create an AI to help with it, but the AI produces results that are contrary to the policy, then I think it's fine to say well, that AI didn't do what we wanted it to do. Let's fix it.
If policymakers are using the concern about biased AI as a hook on which to hang an otherwise inappropriate disparate impact policy, yeah that's bad, but it doesn't seem like that's what's happening. With ADDPA, just take the AI out of it, it's an implementation detail. If the ADDPA wants a policy around disparate impact, it can and should do that without reference to the technology involved.
The "AI Bill of Rights" doesn't impose any such policy. It really is just a set of guidelines for how to avoid those early problems we had with white-guy-only facial recognition systems, which have already been adopted by the industry for the most part (like ensuring representative training data).
If you want to see how things work with a subgroup of people, any subgroup, you need a certain number of participants from that subgroup in your experiment to get a reliable answer for that subgroup. For example, if you want to find out if a drug works better in people with a relatively uncommom gene type, you may have to overrepresent that gene type to get a reliable answer.
Is that a quota? Is it discrimination? A great deal of medicine, and for that matter science generally, would be impossible if scientists were not allowed to overrepresent particular subgroups of interest in order to get a reliable answer specific to that subgroup. The whole burgeoning field of biologically targeted therapies, among many other fields, would be illegal.
What in the world is the difference constitutionally between subgroups based on particular gene variants of interest and subgroups based on characteristics like race? There may be a difference in degree. Perhaps Professor Bernstein is right that, as biological classifications go, race typologies are less reliable than genetic typologies. But there is no defensible difference in kind. Biological classifications are biological classifications.
If you look at the actual 14th amendment, I don't think it actually gets in the way of using genetics correlated with race for medical decisions, including study participation. Technically, it doesn't even apply to private discrimination at all. But what it prohibits by government is refusing equal protection of the law, and getting a tailored medical treatment is hardly that, unless the government mandates that it be tailored to be LESS effective based on race.
But once you have a patient in front of you, you're pretty much always going to be better going off of their actual genetics, rather than just going with some weak 'racial' correlation to those genetics.