The ethics of algorithms: bias in AI immigration decisions

Most public conversations about AI bias get stuck on the same set of examples — facial recognition, hiring algorithms, criminal risk scores. Immigration adjudication is, in some ways, the most consequential domain that almost never makes the headlines. The decisions are life-altering, the appeal channels are limited, and the systems are increasingly automated. The result is a set of harms that are easy for vendors and policy people to overlook, and very hard for the people on the receiving end to push back on.

I want to be direct about my position. I have built and deployed AI in immigration practice. I think it is, overall, going to be net positive — particularly for representation access, which has always been the binding constraint on the system. But the path to that good outcome runs through some specific places where bias has to be designed against. If you are building or buying in this space, these are the failure modes to watch.

Where bias enters the pipeline

Training data drawn from biased outcomes

Models trained on prior adjudications inherit the patterns of those adjudications, including the parts that are not just. A model trained on historical asylum grants will reflect the well-documented variance in grant rates between immigration courts and judges. If that model is used to triage cases, it can quietly entrench the same disparities, dressed up in algorithmic neutrality.

Language and translation effects

Most immigration AI is trained on English-language filings. Cases that originate in another language pass through a translation layer first. Translation quality varies by language pair — Spanish is well-served, Haitian Creole and Karen are not. Downstream models then make decisions on text that is, for some applicants, materially less faithful to the original. The bias is not in the model; it is in the pipeline. Vendors who do not test by language pair will not know it exists.

Document-quality bias

Wealthier applicants arrive with better documentation — apostilled certificates, translated medical records, professional photographs. Document-quality signals can correlate strongly with claim merit in a model's eyes, even though they correlate more strongly with applicant resources. Without explicit guardrails, the model penalizes poverty.

Selection bias in the cases the model sees

If a model is trained on cases from one demographic of firm — large firms with paying clients — it will not generalize well to pro bono or legal-aid casework. The applicants most in need of high-quality representation are exactly the ones the model is least likely to be calibrated for.

Three categories of harm we have observed

arrow_rightDisparate accuracy. The model is meaningfully more accurate for some applicant populations than others. Often invisible until somebody runs the numbers by demographic.
arrow_rightConfidence misuse. The model returns a confidence score; the operator treats high confidence as a green light to skip review. The errors that slip through are concentrated on the populations the model is worst at.
arrow_rightProcedural drift. Once the model is in the workflow, the human reviewer's behavior shifts. They review the model's outputs faster, push back less often, and override less. The bias becomes structural even if the model improves.

What 'auditing' actually means in this domain

Most vendor-published audits are not real audits. They are accuracy reports on a held-out sample, often with no demographic disaggregation. A real audit in immigration AI requires three things: representative test sets that are intentionally stratified by language, country of origin, case type, and document quality; a measurable definition of fairness — equal accuracy across subgroups, equal false-positive rates, or equal calibration; and an independent reviewer who is not the vendor and not the operator.

Most vendor relationships do not survive that requirement. Be willing to walk away.

A pragmatic stance for vendors and firms

arrow_rightDisclose. Tell users what the model was trained on, what populations it has been tested against, and what its known weaknesses are.
arrow_rightAssist, do not decide. AI in immigration should accelerate human judgment, not replace it. Models that produce decisions instead of recommendations belong in research, not in production.
arrow_rightRe-audit on a cadence. Drift is real. A model audited at deployment is not an audited model two years later.
arrow_rightBuild the appeal channel into the product. If a user wants to push back on a model output, the path should be shorter than the path that produced it.

Why this matters more than the headlines suggest

Immigration AI is being deployed in a system that already produces disparate outcomes. That makes the bar for additional algorithmic bias higher, not lower. It also makes the upside higher — a well-built tool can raise the floor for unrepresented applicants, who are currently the worst-served users of the system. Both are true at once. The vendors who internalize that — and the firms who insist on it — are the ones I would trust with the most consequential paperwork of someone else's life.

Found this useful?

Share it with your team.