OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins' 37
An anonymous reader quotes a report from Ars Technica: The system prompt for OpenAI's Codex CLI contains a perplexing and repeated warning for the most recent GPT model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."
The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."
Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days. Update: OpenAI has published a blog post explaining "where the goblins came from."
In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.
The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."
Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days. Update: OpenAI has published a blog post explaining "where the goblins came from."
In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.
Re: (Score:2)
# fuck this shit, fucking lscagg, fuck my life its fucking miller time, actually fuck it. inviting my friends johnny, jim and jack over , fucking three wise men are needed to wipe this fucking shit from my brain
Funny but serious (Score:3)
Re: (Score:2)
Re:Funny but serious (Score:4, Insightful)
To be fair, in this instance they almost specifically instructed the AI to act like this:
"You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]"
Why the hell they thought that is what a "Nerdy" personality is, is a whole different story.
Re: (Score:2)
To be fair, in this instance they almost specifically instructed the AI to act like this:
"You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]"
Why the hell they thought that is what a "Nerdy" personality is, is a whole different story.
To be fair, Demolition Man was a little nerdy at the time, and might explain the inspiration.
https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
To be fair, Demolition Man was a little nerdy at the time, and might explain the inspiration.
Yup. I mean, how many other science-fiction, action movies discuss Constitutional Amendments [youtube.com]? :-)
Re: (Score:2)
Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."
Re: (Score:2)
Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."
And Calvin's dad describing how the world was black and white [reddit.com] before colors were invented. :-)
I once had a 1969 VW Beetle and those only had one dashboard light for both turn signals -- like this <=> -- and a friend asked me how you could tell if it was lighting up for the left or right signal. I replied that it blinked on then off for the left signal but off then on for the right.
Re: Funny but serious (Score:2)
Re: (Score:3)
an amusing example of how training can go wrong
My understanding is that this isn't a consequence of a flawed training algorithm or process; it's instead a consequence of the limitations of LLMs, emergent from their training materials. It closely parallels another example I've seen around the net, that of asking an LLM about getting a car to the mechanic, noting it's a sunny day and the mechanic is just a block away, and having the LLM suggest walking... which is a consequence of the bias in training materials toward walking because lots of people make
I'm more concerned out this (Score:2)
>>alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed"
Since overuse of emojis and em dashes are a classic indicator of AI generated text that people now know to look for, it pretty clear they are actively trying to hide the nature of their LLM output.
Re: (Score:2)
Re: (Score:2)
If you care about precision in language, emojs have no space. What each one means can sometimes can be very ambiguous. I only really use AI in search results but I prefer to get information with precise language and not wishy-washy nonsense, not some pictogram that could mean literally anything.
Re: (Score:2)
You could make all the same criticisms of most common English.
For instance if someone says that they are ambivalent about a decision?
What do they mean. About half the people you ask would say, he does not care either-way. The other half would probably say he is torn, or of two minds about it. Of that latter group some might conclude this also implies grave concern about this issue, while others don't.
So how much clearer a communication was it than writing -\_o_/- ?
Re: (Score:2)
The difference between using a word such as ambivalence, which has a specific meaning found in the dictionary or an emoj that isn't etched in stone.
For the record, ambivalence means to be undecided. What you wrote, -\_o_/-, has zero meaning to me. I tried looking it up and no go. I suppose I can look at it and it kind of sort of looks like a person doing a shrug motion but I'm only guessing that because you already mentioned ambivalence, which has a specific meaning that's very easy to look up.
Maybe I'm jus
Example of control (Score:2)
Also - don't mention the war! (Score:2)
I did once, but I think I got away with it.
Re: (Score:2)
We've always been at war with Eurasia.
This is what uncurated training causes (Score:5, Insightful)
If you neglect this, then the model may fail in anomalous and unpredictable ways. In other words: you can run 10,000 tests and they'll all be just fine, but when you run the 10,001st, the model fails. Worse, you won't know how...or why...or how to fix it, because the answers to those questions are buried in a network too large for a human being to comprehend. This problem has been well-known for decades; it's how things like this: Tesla Autopilot Confuses Boy In Orange Shirt For A Cone In Brazil [insideevs.com] happen. They thought they were training the vision system to recognize traffic cones; they were really training it to recognize orange objects of a certain size and height:width ratio.
Faced with this situation, you can either (a) go back and figure out what you did wrong in the training process or (b) slap a half-ass patch on this particular failure to just make it go away. Choosing (b) is simple and quick and easy and cheap. But if you pick that choice and skip (a), then you have zero assurance that the 15,027th test or the 21,922nd test won't fail just as badly, because you did nothing to address the root cause.
And predictably, this -- choice (b) -- s what OpenAI has done. It's predictable because they made no attempt whatsoever to curate the training data in the first place -- they just stole everything they could from the entire Internet -- because they're cheap and lazy and a in hurry to cash in before the bubble bursts. This move is entirely consistent with that approach. I would call it "poor software engineering" but it doesn't even deserve to be in the same sentence with "engineering".
Imagine hiring an employee (Score:2)
Imagine hiring an employee that required such a level of micro-management. You'd be showing them the door and thanking them for their contributions.
encourage its "Nerdy" personality (Score:2)
Robots don't need personality
They need to work accurately and reliably
stop (Score:2)
Be careful. Something doesnâ€(TM)t look right.
LibreWolf spotted a potentially serious security issue with arstechnica.com. Someone pretending to be the site could try to steal things like credit card info, passwords, or emails.
Be careful out there.
Seven Flirty Words. (Score:2)
"never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures"
Pardon me while I channel my inner Carlin creating my next set of OpenAI usernames.
Trolly McTrollface is about to go HAM on that new sticks-n-stones plugin..
Personality (Score:2)
How about not giving it a personality at all? Just do what I say to do.
Don't mention goblins (Score:2)
Please add... (Score:2)
Asimov's three laws.
We have enough psychopaths and sociopaths trying to rip us off and kill us with MI (meat intelligence). Let's see if we can keep AI ready to protect humans, obey humans, and protect itself. In that order.
Welcome to Slashdot (Score:2)
The first rule of Fight Club (Score:2)
Fight Club (Score:2)
First rule of Goblins (AI) never talk about Goblins.