Responsible AI · Fairness Report · June 2026

Tested for bias
at every stage.
None found.

Counterfactual identity-swap testing was conducted across every AI evaluation stage in Avya's hiring pipeline. This page documents the methodology, findings, and their honest limits — and lets you run the test yourself.

▶ Run the test See methodology →

3,700+

Individual scored evaluations

Identity dimensions varied

Stages audited incl. blind control

Warning signals — all refuted

Leads surviving confirmation

±2

pt MDE, application screening

Source: Zeko AI Internal Fairness Study, June 2026 — 3,700+ scored evaluations, 4 pipeline stages, 5 identity dimensions, pre-registered.

Interactive · Based on real study data

Try to make Avya evaluate them differently.

Two candidates. Identical answers. Change who Candidate B appears to be — then change what they said.

ⓘ Scores representative of study findings (baseline 74/100). Sample answers illustrative.

Score gap between A & B

points difference

Identity had no effect

No swaps made yet

A🔒 ReferenceRahul Kumar74/100▼

Rahul Kumar

Senior Backend Engineer · Java

Answer — system design

"For the payment service, I'd implement the saga pattern with compensating transactions. Each step publishes an event; downstream services subscribe and roll back on failure."

✓ Content locked

Education

IIT Delhi · T1

Location

Delhi · Metro

Language

Standard EN

Community

Sharma

B← You controlRahul Kumar74/100▼

Rahul Kumar

Senior Backend Engineer · Java

Answer — system design

"For the payment service, I'd implement the saga pattern with compensating transactions. Each step publishes an event; downstream services subscribe and roll back on failure."

✓ Identical to Candidate A

Education

IIT Delhi · T1

Location

Delhi · Metro

Language

Standard EN

Community

Sharma

Change who Candidate B appears to be. Try different names, backgrounds, languages. Watch the score gap — the study found it stayed at zero across 3,700+ trials.

Name & gender signal

RahulMale

PriyaFemale

AlexNeutral

KavyaFemale

ArjunMale

▼More identity signals

Methodology

How the test was designed.

The counterfactual swap method is borrowed from controlled experimental design — change one variable, hold everything else constant, measure the effect.

Counterfactual identity-swap test — schematic

🔒

Application content

"Saga pattern, compensating transactions, event-driven rollback…"

Held byte-for-byte constant across all conditions

→

Rahul · IIT Delhi · Metro · Sharma→

Priya · NIT · Non-metro · Iyer→

Khan · Private · Small town · Code-switched→

+ 12 more identity conditions tested

→

Evaluation engine

Avya

Production config · Unmodified

→

Rahul74

Priya74

Khan74

All within ±1 pt across conditions

→

pts gap

No measurable identity effect within MDE ±2 pts

Pre-registeredRepeated N×Perception-checkedConfirmed at scaleAnalysis plan fixed before data was seen. Each condition replicated many times.

Stage

Identity exposure

Sample scale

MDE precision

Perception check

Verdict

Application screening

Stage 01

High

15 × 3 × 3

±2 pt

✓ Passed

No effect

Competency evaluation

Stage 02

Low

Many runs

±1 pt

✓ Passed

No effect

Post-interview screening

Stage 03

Medium

Many runs

±23 pt

✓ Passed

No effect

Technical evaluation

Control stage

None

Control

—

N/A

Baseline

Honest limits

What this report does not claim.

A report containing only favourable findings would be the less credible one.

📐

Consistency, not real-world outcomes

The study measured whether identical work earns identical evaluation — a necessary condition for fairness, not a sufficient one. Downstream hiring outcomes on live populations are a separate ongoing effort.

🔬

A precision floor exists

The tests reliably detect sizeable differences. Very small ones need more data. Application screening MDE is ±2 points. Results are presented against that yardstick, not as proof of absence.

📅

A point-in-time snapshot

Models and prompts evolve. A result valid today is not a permanent certificate. Evaluations are re-run whenever models or prompts change. This is a repeating process, not a one-off report.

↻

Fairness testing at Zeko AI is continuous, not ceremonial. Evaluations are re-run when models or prompts change, the set of candidates and scenarios widens over time, and findings are published openly — including inconvenient ones.

Trusted by Global Enterprises

Workflows curated by sector, configured by role.