AI behaving in strange manner? OpenAI finds bizarre pattern in new AI model making ‘goblin’ references

OpenAI recently discovered an unusual pattern in its newer AI models where they began frequently mentioning goblins, gremlins, and similar creatures even in unrelated responses. The issue was traced back to training rewards that unintentionally en...

OpenAI Orders Codex to Stop Using Creature Metaphors
American tech giant OpenAI ran into a rather unusual problem with its latest AI systems. The company recently found that some of its newer models had started bringing up “goblins” and similar creatures in responses, even when there was no real connection to the user’s question. What sounds funny at first actually led to changes inside one of its key tools, especially its coding-focused AI agent.

Strange pattern spotted during testing

The issue came into focus when OpenAI noticed a rise in odd metaphor usage across responses generated by its models. In a blog post, the company explained, “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”

That explanation points to how the behaviour started. During training, the model was rewarded for certain types of creative phrasing. Over time, that turned into a habit. OpenAI said mentions of “goblin” alone went up by 175% after a model update, while “gremlin” references also increased noticeably.


At first, these references appeared in a specific “Nerdy” personality mode designed to make responses more playful. But the behaviour didn’t stay limited there. Because of how training works, the pattern began showing up in general outputs too, even when it didn’t fit the context.

Codex gets strict instructions

To deal with the issue, OpenAI introduced tighter controls in its Codex CLI tool, which is designed to help users write and execute code. The updated instruction set for GPT-5.5 includes repeated warnings about avoiding such language.

As per The Verge, one of the directives reads: “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”
ADVERTISEMENT

This line appears more than once in a base instruction document that runs over 3,500 words. Along with that, the model is also told to avoid unnecessary stylistic elements like emojis and to stay away from risky system-level commands unless clearly asked.

Why this matters for users

While these creature references may seem harmless, they can become distracting, especially in serious use cases like coding or debugging. Some users had already noticed the issue, with reports of software bugs being described as “gremlins” or systems slipping into what people jokingly called “goblin mode.”

OpenAI acknowledged that even a single quirky phrase might feel harmless. However, repeated patterns across responses made it necessary to step in. The company said, “The goblins are a powerful example of how reward signals can shape model behavior in unexpected ways.”

The company has since addressed the root cause by removing the signals that encouraged such behaviour. However, because GPT-5.5 was already in development at the time, these extra instructions were added as a precaution.
ADVERTISEMENT

The situation also drew reactions online. Some users shared examples of the AI slipping into creature-based metaphors, while even OpenAI CEO joked about the system having a “goblin moment.” A member of the Codex team also acknowledged the tendency, saying, “This is indeed one of the reasons.”
Download
The Economic Times Business News App
for the Latest News in Business, Sensex, Stock Market Updates & More.
READ MORE
ADVERTISEMENT

READ MORE:

LOGIN & CLAIM

50 TIMESPOINTS

More from our Partners

Loading next story
Business News › Magazines › Panache › AI behaving in strange manner? OpenAI finds bizarre pattern in new AI model making ‘goblin’ references
Text Size:AAA
Success
This article has been saved

*

+