As artificial intelligence (AI) systems become more intertwined with daily life and decision-making, a pressing question looms: Can AI truly grasp the values that guide human choices? Traditional AI is engineered to optimize efficiency, accuracy and speed. Humans, by contrast, make decisions shaped by culture, social norms and lived experience, factors that are rarely captured in algorithmic design.
New research from the University of Washington suggests that machines could learn values by observing how people behave in different cultural contexts. This approach may lead to AI that is not only more contextually aware but also more trustworthy to diverse users.
Moving Beyond One-Size-Fits-All
Most AI systems today, including large language models and recommendation engines, are trained with a goal of maximizing predefined outcomes. That often involves reinforcement learning, where certain behaviors are rewarded and others are penalized, creating a static set of priorities baked into the model’s behavior. This method works for clearly defined tasks, such as translation or image classification, but it struggles with nuanced decisions involving human values, which differ widely across cultures.
A study published by the researchers challenges this conventional model by exploring inverse reinforcement learning (IRL) as a way for AI to infer values from human behavior rather than being handed a rulebook. To test that hypothesis, the researchers designed a real-time, multi-agent online game that required trust-based and altruistic decisions.
Participants drawn from self-identified cultural groups played the game under conditions where no single strategy consistently maximized individual payoff. That structure allowed differences in cooperation, reciprocity and risk tolerance to emerge organically rather than being forced by the game design. The researchers then used gameplay to infer reward functions representing latent preferences rather than surface-level actions.
We’d love to be your preferred source for news.
Please add us to your preferred sources list so our news, data and interviews show up in your feed. Thanks!
Those reward functions were used to train AI agents, which were then evaluated in novel scenarios within the same task environment. The results showed that agents trained on different groups’ data exhibited systematic behavioral differences, particularly in how strongly they prioritized collective outcomes over individual gain. Crucially, those differences persisted across new situations, indicating that the models learned stable preference signals rather than memorizing specific game states.
Advertisement: Scroll to Continue
“We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group’s behavior, and that this learned value system can generalize to new scenarios,” the authors note in published research from an earlier, connected study.
Shortcomings
The researchers said the models do not “understand” culture in a symbolic or human sense, nor do they reason about morality abstractly. Altruism was deliberately chosen as a tractable proxy for value learning because it can be operationalized, measured and evaluated experimentally. The learned reward functions describe patterns in behavior, not ethical judgments about what ought to be done.
The implications of the work are methodological rather than philosophical. By demonstrating that value-relevant behavioral variation can be encoded through learned reward functions without explicitly labeling values or embedding normative assumptions, the research points to a way of building AI systems that adapt to heterogeneous human preferences. Such an approach could matter in settings where preferences are context-dependent and difficult to formalize, including collaborative decision-making, negotiation and human-in-the-loop automation.
At the same time, the limitations are significant. Learning values from behavior means learning from data that may reflect bias, inequality or situational constraints rather than normative intent. The resulting models are descriptive, not prescriptive, capturing what participants did rather than what they should have done. The researchers frame this not as a solution to machine ethics, but as a step toward systems that can align behavior with observed human patterns rather than imposing a one-size-fits-all optimization framework.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.