OpenAI explains why language models ‘hallucinate’; evaluation incentives reward guessing over uncertainty

OpenAI finds a key problem in how large language models work. These models often give wrong information confidently. The issue is in how these models are trained and checked. Current methods reward guessing, even if uncertain. OpenAI suggests new ...

By Manu Kaushik, Global Desk | Sep 15, 2025, 05.02 AM IST

OpenAI researchers say hallucinations in AI stem from how models are trained and tested, where scoreboards reward guesses over honest uncertainty

OpenAI has identified a fundamental flaw in the design of large language models (LLMs) that leads to the generation of confident yet incorrect information, known as "hallucinations." This discovery, detailed in a recent research paper, challenges existing assumptions about AI reliability and proposes a paradigm shift in model evaluation.

Hallucinations in AI refer to instances where models produce statements that are factually incorrect but presented with high confidence. For example, when queried about the title of a PhD dissertation by XYZ, a prominent researcher, the model provided three different titles, none of which were accurate. Similarly, it offered three incorrect birthdates for Kalai.

The core issue, as identified by OpenAI researchers, lies in the training and evaluation processes of LLMs. Traditional methods focus on binary grading, correct or incorrect, without accounting for the model's confidence in its responses. This approach inadvertently rewards models for making educated guesses, even when uncertain, because a correct guess yields a positive outcome, whereas admitting uncertainty results in a zero score. Consequently, models are trained to prioritize providing an answer over acknowledging a lack of knowledge.The research paper states:

According to Futurism website, Hallucinations "persist due to the way most evaluations are graded, language models are optimized to be good test-takers, and guessing when uncertain improves test performance," the paper reads.

To address this issue, OpenAI suggests a shift towards evaluation methods that value uncertainty and penalize confident inaccuracies. By implementing confidence thresholds, models would be encouraged to refrain from answering when unsure, thereby reducing the likelihood of hallucinations. This approach aims to enhance the reliability of AI systems, especially in critical applications where factual accuracy is paramount.

"Most scoreboards prioritize and rank models based on accuracy, but errors are worse than abstentions," OpenAI wrote in an accompanying blog post.

Experts acknowledge that eliminating hallucinations may be unattainable, but improvements in training and evaluation methodologies can lead to more trustworthy AI systems. The proposed changes have broader implications for AI development, including potential impacts on user engagement. Models that frequently admit uncertainty might be perceived as less competent, possibly affecting user trust and adoption. Therefore, balancing accuracy with user experience remains a critical consideration.

Download
The Economic Times Business News App for the Latest News in Business, Sensex, Stock Market Updates & More.

OpenAI explains why language models ‘hallucinate’; evaluation incentives reward guessing over uncertainty

OpenAI finds a key problem in how large language models work. These models often give wrong information confidently. The issue is in how these models are trained and checked. Current methods reward guessing, even if uncertain. OpenAI suggests new ...

Related Articles

READ MORE:

More from our Partners

Popular Categories

Hot on Web

In Case you missed it

Top Searched Companies

Latest News

Download ET APP

Follow us on

become a member