The controversy surrounding COMPAS exposes a significant gap between computer science and social science in understanding bias, highlighting the need to align computational fairness metrics with humanistic interpretations. In response, we introduce the EVENS benchmark to align the Equality versus Equity Notion Spectrum in LLMs. Our contributions include constructing an equality-equity notion spectrum and generating a corresponding dataset of key fairness scenarios, evaluating models' initial stances and test stance adjustments under external legal regulations and internal organizational regulations using Retrieval-Augmented Generation (RAG), introducing Chain-of-Thought (CoT) prompting to guide fairness reasoning, and adding an uncertain choice to assess its impact. Our findings indicate that LLMs initially favor equality over equity. Incorporating legal and organizational regulations of equity through RAG can reduce proportional equality in most models and enhance equity recognition in GPT4o significantly. CoT improves the equity reasoning of Chinese models but may also rationalize existing biases, and the uncertain option promotes more cautious responses. The links to the code and datasets: https://github.com/CrexCheng/EVEN.
Chen, Q., Cheng, R., Liu, Y., Xie, Z., Zhao, K., Li, P., et al. (2026). EVENS: Equality versus Equity Notion Spectrum of LLMs. Association for Computing Machinery, Inc [10.1145/3769126.3769262].
EVENS: Equality versus Equity Notion Spectrum of LLMs
Chen Q.;Cheng R.;Xie Z.;Rotolo A.;
2026
Abstract
The controversy surrounding COMPAS exposes a significant gap between computer science and social science in understanding bias, highlighting the need to align computational fairness metrics with humanistic interpretations. In response, we introduce the EVENS benchmark to align the Equality versus Equity Notion Spectrum in LLMs. Our contributions include constructing an equality-equity notion spectrum and generating a corresponding dataset of key fairness scenarios, evaluating models' initial stances and test stance adjustments under external legal regulations and internal organizational regulations using Retrieval-Augmented Generation (RAG), introducing Chain-of-Thought (CoT) prompting to guide fairness reasoning, and adding an uncertain choice to assess its impact. Our findings indicate that LLMs initially favor equality over equity. Incorporating legal and organizational regulations of equity through RAG can reduce proportional equality in most models and enhance equity recognition in GPT4o significantly. CoT improves the equity reasoning of Chinese models but may also rationalize existing biases, and the uncertain option promotes more cautious responses. The links to the code and datasets: https://github.com/CrexCheng/EVEN.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


