World Economic

Global trade, energy transition, financial regulation, multinational corporations, and macroeconomic trends.

Largest study of AI hiring algorithms to date finds ‘clear racial disparities’

5 min read

The most comprehensive independent study of AI-powered hiring algorithms ever conducted has found stark racial disparities embedded in the tools used to screen millions of job applicants, with more than one in four applications submitted by Black job seekers directed to positions where the algorithm produces outcomes that trigger federal discrimination scrutiny.

The paper, “Algorithmic Monocultures in Hiring,” was authored by researchers at Stanford University, Chapman University, and Northeastern University, and will be presented at the ACM Conference on Fairness, Accountability, and Transparency in Montreal next month. It analyzed more than 4 million job applications submitted by 3 million applicants across 156 employers — mostly companies with $5 billion and up in annual revenue — all screened by algorithms built by the same vendor, a talent platform called Pymetrics.

“We find clear racial disparities in applicant outcomes,” the authors write.

“As a single vendor comes to dominate decision-making in a space, their quirks or shortfalls can be present across that entire sector in a way that wasn’t possible before,” Northeastern professor and research co-author Kathleen Creel told the Financial Times, which previously reported on the study.

Pymetrics’ owner, Harver, did not respond to a request for comment.

How the algorithm works—and where it breaks down

Pymetrics, which was acquired in 2022 and whose algorithms are used by major employers across finance, manufacturing, and technology, screens applicants not through resumes but through a battery of online games designed to measure cognitive traits like risk tolerance, processing speed, and altruism. The company has long marketed this approach as more objective than traditional resume screening, and, in its own prior analysis, found no disparities that rose to the level of legal scrutiny.

The new research challenges that conclusion — not by disputing Pymetrics’ math, but by arguing the company was asking the wrong question.

Pymetrics had measured bias by pooling all of its applicants and outcomes together, across all employers and positions. The Stanford-led team instead analyzed each of the 1,746 individual positions separately, which is how U.S. employment discrimination law — specifically the Equal Employment Opportunity Commission’s so-called “four-fifths rule” — is actually designed to be applied.

When analyzed position by position, 10.62% of jobs in the dataset showed an adverse impact on Black applicants, meaning the algorithm recommended Black candidates at a rate below the federal threshold relative to the most-selected racial group. Thirty percent of Black applicants applied to at least one such position. And 25.87% of all applications submitted by Black applicants — nearly 40,000 submissions — were for positions where the algorithm produced what federal guidelines define as discriminatory outcomes.

Asian applicants were also significantly affected: 14.74% of their applications went to positions with discriminatory outcomes.

“Aggregating from individual positions to occupation groups suffices to mask the per-position adverse impact,” the authors write, calling the practice of reporting only aggregate results an “improper, or at minimum an incomplete,” interpretation of federal guidance.

The ‘Algorithmic Blackball’ effect

The study’s second major finding may be even more consequential for job seekers: the same vendor’s algorithms are so highly correlated across employers that being rejected by one company meaningfully predicts rejection by the next.

Researchers call this “systemic rejection.” Among applicants who applied to 10 positions screened by Pymetrics, 4% were rejected from every single one — a rate statistically higher than what chance would predict if each employer were making independent decisions.

To put that in concrete terms: when an applicant plays Pymetrics’ assessment games, their scores are stored and reused for up to 330 days. If two different companies both use Pymetrics, an applicant isn’t really getting two separate evaluations — they’re getting the same score, twice. Some applicants are, in effect, algorithmically locked out of multiple companies at once without knowing it.

The researchers describe this as an “algorithmic blackball” — a term previously theorized in academic literature but never before documented at this scale in deployed real-world data.

To understand how deep the problem runs, the team ran a large-scale simulation, exploiting the fact that algorithms — unlike human reviewers — produce the same output for the same input every time. They asked Pymetrics to run its models on a sample of 1,000 applicants against every applicable position in the dataset. The good news: no applicant was rejected by all models. The bad news: to reduce the probability of being systemically shut out to below 0.1%, an applicant would need to apply to at least 25 different positions — more than double the 10 applications that would suffice if hiring decisions were made independently.

And, the authors note, a Pymetrics recommendation only gets an applicant into the pool of candidates reviewed by a human. It doesn’t guarantee an interview.

The concentration problem

The findings land at a moment when the AI hiring industry has become highly concentrated. As of May 2023, over 60% of the Fortune 100 and eight of the 10 largest U.S. federal agencies used HireVue’s algorithms, according to the paper. The authors warn that this concentration creates systemic risks beyond bias — if a single dominant vendor goes offline or is found to be producing discriminatory outcomes, hiring at thousands of employers could be disrupted simultaneously.

“By consolidating part of the hiring decision process across distinct employers, hiring algorithms impact collective adverse impact rates and patterns of systemic rejection,” the authors write.

Policy implications

The study arrives as regulators in both the U.S. and Europe are actively grappling with how to govern AI hiring tools. New York City passed Local Law 144 in 2021, the first legislation directly targeting algorithmic hiring — but the authors found that its existing government guidance appears to instruct auditors to pool data across positions and employers, exactly the aggregation method they argue masks disparities.

In Europe, the EU AI Act designates hiring algorithms as high-risk AI systems by default, with compliance requirements taking effect August 2, 2026 — just weeks away.

The authors make four policy recommendations: measure adverse impact at the position level; strengthen cross-employer market surveillance; monitor risks from algorithmic concentration; and create legal pathways for independent researchers to access hiring algorithm data, similar to provisions in the EU’s Digital Services Act that compel large platforms to share data with academics.

The last point carries an implicit warning. This study was only possible because Pymetrics voluntarily provided its data under an agreement that guaranteed the researchers’ independence. The authors acknowledge their findings could inadvertently discourage future data sharing by vendors who would prefer their algorithms remain opaque.

“Independent research is necessary to illuminate otherwise-opaque hiring algorithms,” they write. Without it, the racial disparities documented in this study — affecting tens of thousands of applicants across some of America’s largest companies — might never have come to light.

For this story, Fortune journalists used generative AI as a research tool. An editor verified the accuracy of the information before publishing.

#Largest #study #hiring #algorithms #date #finds #clear #racial #disparities

Leave a Reply

Your email address will not be published.