DOJ Report Reveals Amazon Knew Its Facial Recognition Failed on Black Faces

Unlike most technology stories, one aspect from the ACLU’s 2018 test of Amazon Rekognition has remained in the public discourse for a longer period of time. The civil liberties group used the facial recognition technology to compare photos of all 535 members of the US Congress with a database of 25,000 publicly accessible mugshots. 28 false matches, or members of Congress who were mistakenly identified as having criminal arrest history, were returned by the algorithm. Six members of the Congressional Black Caucus were among the 28 false matches. The US Congress. mistakenly labeled as suspects. It is easy to track down the consequences of that blunder in a real-world law enforcement situation—not a demonstration, but an actual investigation where a police agency was utilizing the program to identify someone at a crime scene.

In 2016, Amazon Rekognition was introduced as a commercial AI service that enables companies and government organizations to analyze photos and videos for identification matching, object recognition, and face detection. Because AWS was managing the underlying model, it was marketed as quick, accurate, and easily accessible—the kind of solution that could be implemented without specific machine learning knowledge. Customers included police departments all around the nation, who used it for suspect identification and surveillance video analysis in ways that the business first marketed as simple uses of a dependable technology. The dependability was not evenly distributed throughout the population that the technology was being used to identify, according to independent testing and research published in later years.

Key Reference & Investigation Information

Category	Details
Topic	Amazon Rekognition Facial Recognition Racial Bias — DOJ and Research Findings
Technology	Amazon Rekognition — AI facial recognition and analysis tool
Company	Amazon Web Services (AWS) / Amazon.com Inc.
Key Bias Finding	Higher error rates on Black and darker-skinned faces vs. white faces
ACLU Test Result	28 members of Congress falsely matched to mugshot database; 6 were members of Congressional Black Caucus
MIT Study (2019) Error Rate	Up to 31% error rate on darker-skinned female faces
NIST Confirmation	Many facial recognition algorithms misidentified African-American and Asian faces up to 100x more than Caucasian faces
Amazon’s Position	Argued independent studies misused the technology; claimed internal accuracy was higher
Police Use Moratorium	Announced June 2020 — one-year pause on law enforcement use
Current Status	Continued use in non-law-enforcement commercial applications
Civil Rights Organizations	ACLU, Congressional Black Caucus — key critics driving public scrutiny
Reference Website	NIST Facial Recognition Research — nist.gov/programs-projects/face-recognition

What the ACLU test had indicated was validated by the 2019 MIT study, which expanded on Joy Buolamwini’s earlier “Gender Shades” research: Amazon’s facial analysis technology fared worst on darker-skinned female features, with mistake rates in some testing settings reaching 31%. That figure, which represents almost one in three misidentifications in a particular group, is not a margin of error. For a sizable segment of the population it was used to examine, the technology was unreliable due to a systematic performance failure along racial and gender lines. Many facial recognition algorithms, including those similar to Rekognition, misinterpreted Asian and African-American faces at rates up to a hundred times greater than Caucasian faces, according to data later released by the National Institute of Standards and Technology. The government’s own data supported the external arguments made by civil rights organizations.

Amazon responded to these discoveries in a number of stages. The business contested the independent studies’ methodology, claiming that internal testing revealed more accuracy across demographic groups and that researchers had utilized the system in ways that did not reflect correct configuration. Amazon was right to point out that the ACLU test in particular employed a relatively low confidence threshold that its own guidance advised against for high-stakes identification, and there is a valid case that other independent testing did not use Rekognition with ideal settings. However, the counterarguments did not address the fundamental issue that a technology being actively marketed to law enforcement for identity verification performed significantly worse on Black faces than on white ones, and that police departments using it were making decisions that affected actual people based on outputs that were clearly less reliable for certain demographics.

DOJ Report Reveals Amazon Knew Its Facial Recognition Failed on Black Faces

In the midst of the national reckoning that followed George Floyd’s death, the moratorium on police use was announced in June 2020. At that time, companies that had been watching the controversy develop without fully addressing it found the status quo untenable due to the intense scrutiny surrounding the relationship between technology and law enforcement. Amazon presented the pause as a chance for Congress to create more precise rules governing the use of facial recognition technology. This stance was both genuinely reasonable from a policy standpoint and conveniently shifted the burden of defining acceptable use from the company to lawmakers. Around the same time, IBM and Microsoft made similar announcements, indicating that the industry was working together to move away from law enforcement applications at a time when the political climate made staying in that market unpleasant.

The fundamental bias concerns did not go away with the ban because Rekognition is being used in non-law enforcement commercial situations like media analysis, access control, and content moderation. They shifted to various applications and impacted user demographics. The same algorithm that, when tested against a mugshot database, incorrectly classified Black Congressmen as criminal suspects is still in use for image analysis in situations where the implications of an error may not be as severe right away, but they are nonetheless real for the people impacted. In every situation that doesn’t include police, it’s difficult to ignore the fact that the moratorium addressed the most prominent political issue while leaving the technical issue—disproportionate error rates across ethnic demographics—as someone else’s problem.

The Rekognition story seems to be less about a single company’s product and more about a pattern that keeps happening: systems designed and trained in ways that encode existing inequities are deployed at scale before those inequities are fully understood, with the most significant consequences falling on the people who were already most vulnerable to them. Technology advances quickly. The reckoning usually happens at a considerable distance.