I went looking for a boring answer and did not get one.

The question sounded simple: on modern OpenAI-style APIs, what is the real difference between top-level instructions, a developer message, and the old system idea? Is this just naming churn, or is there something operationally useful here?

There is.

The useful distinction is not philosophical. It is architectural.

That last part is the whole game.

The fast mental model

If you only remember one thing, make it this:

System/instructions are the constitution. Developer is the dispatch radio.

A constitution is durable. It defines identity, tone, and broad rules.

A dispatch radio is live. It can say:

Those are not the same category of information. A lot of prompt designs jam them into one big immortal blob and then wonder why the result feels brittle.

What we found on the direct ChatGPT backend

Testing the direct ChatGPT backend was instructive even if it was not the public API.

What it showed:

That is already enough to kill the lazy idea that all of these channels are equivalent.

Not equal. Not even close.

Why developer messages are interesting

Because they can be inserted mid-conversation.

That sounds minor until you see what it buys you.

A developer message can arrive after turn 1, after turn 5, or after a state transition. It can say, in effect:

That means the prompt stops being one monolithic block at the top of the thread and starts acting more like a control plane.

That is the interesting part.

A visual mock session

Here is the simplest way to see the difference.

Without developer weaving

INSTRUCTIONS:
You are Jarvis. Be concise. Use bullets on web. Be short on Telegram.

USER:
Help me compare these two APIs.

ASSISTANT:
Here are the differences...

[platform changes from chat CLI to web UI]

USER:
Now format this more richly.

ASSISTANT:

If the platform change only lives outside the model, or gets awkwardly stuffed into user text, the model has no clean first-class way to understand that a runtime condition changed.

With developer weaving

INSTRUCTIONS:
You are Jarvis. Direct, concise, no filler.

USER:
Help me compare these two APIs.

ASSISTANT:
Here are the differences...

DEVELOPER:
<context_update>
Execution context changed:
- platform: web
You may now use richer formatting such as headings, bullets, and links when useful.
Do not become verbose by default.
</context_update>

USER:
Now format this more richly.

ASSISTANT:
## API Differences

- ...
- ...

That developer message is not identity. It is not user intent. It is runtime state.

That is exactly why it belongs in a separate lane.

Codex is a good example because it actually uses this split

Codex does not treat this as a theory exercise. It uses both lanes.

instructions in Codex

Codex sends a large base prompt as top-level instructions.

In the current source, that comes from:

The first hundred words of the gpt-5.4 base instructions start like this:

You are Codex, a coding agent based on GPT-5. You and the user share the same workspace and collaborate to achieve the user’s goals…

And the tail end includes operational communication rules like frequent progress updates and how long those updates may be.

So this is not a tiny stub. It is the durable operating shell.

developer in Codex

Codex also assembles runtime developer messages in code, notably in:

The developer bundle is built from sections such as:

That list matters because it shows what Codex thinks developer is for.

Not identity. Not branding. Runtime control.

One concrete Codex example: sandbox and approval policy

A very good operational-policy example from Codex is the permissions block.

It tells the model things like:

That is a classic developer message.

It is specific to the live session. It may change later. It is not part of the model’s timeless personality.

Trying to cram that into one giant static instructions field is how prompt architecture turns into a junk drawer.

Strength: what developer tends to beat

The practical strength story is roughly:

That last point surprised me the first time I tested it.

People often assume the top field is the king. Sometimes it is just the lobby sign.

The actual traffic control may be happening in the woven developer history.

What belongs in instructions

Use instructions for durable things:

For Jarvis-style systems, that is the layer that says:

That is constitution material.

What belongs in developer

Use developer messages for live policy:

This is dispatch-radio material.

Simulating developer messages on providers that do not support them

This is where things get mildly ugly.

Not every provider gives you a clean native developer role.

Anthropic is the obvious example. In the normal Messages API shape you get a top-level system field and then user/assistant messages. There is no first-class woven developer lane that behaves like OpenAI’s.

So what do you do?

Pattern 1: keep an internal split anyway

Even if the provider flattens things, your application should still distinguish internally between:

  1. base instructions
  2. developer policy
  3. context updates
  4. user messages

That internal taxonomy is worth having even if one provider squashes two layers together.

Pattern 2: compile authoritative policy into system

For providers without developer support, take the current effective snapshot of authoritative policy and merge it into the top-level system prompt.

For example:

<base_instructions>
You are Jarvis. Direct, concise, no filler.
</base_instructions>

<developer_policy>
- sandbox_mode: workspace-write
- network: disabled
- active_skill: wasnotwas
- repo_instruction: do not edit generated files directly
</developer_policy>

That preserves authority better than pretending these are ordinary user messages.

Pattern 3: represent woven context updates explicitly

Some updates are not really policy. They are events.

Examples:

On a provider without native developer-role weaving, you can inject a synthetic context message with clear wrappers, for example:

<context_update source="runtime">
Execution context changed:
- platform: web
Formatting can now be richer.
</context_update>

If that must be lowered as a user-like message because the API only supports user/assistant, fine. But call it what it is: a context update, not genuine user intent.

Pattern 4: preserve authority over chronology for real policy

This is the important compromise.

For real policy changes — approval mode, sandbox, network permissions, tool availability — it is usually better to preserve authority than exact weave position.

So on Anthropic-like providers:

That is not as elegant as native developer messages, but it is honest about the trade-off.

The internal model I would use

The clean abstraction is:

1. Base instructions

Stable identity and voice.

2. Developer policy

Authoritative runtime rules.

3. Context updates

Woven events that change how the current turn should be interpreted.

4. User messages

Actual user input.

Then lower those to each provider as best you can.

OpenAI-style backends can preserve more of the structure. Other providers may flatten parts of it. That is fine. The point is to stop pretending one giant prompt blob is enough.

Why this matters more than people think

Because once tools, permissions, modes, resumed state, and UI surface changes are involved, prompt architecture is no longer just “write a better system prompt.”

It becomes state management.

And state management wants layers.

Codex appears to understand this already. It keeps a substantial base instructions prompt, then uses developer messages as a live operational channel. That is a better mental model than treating everything as one top-level slab.

The blunt conclusion

Developer messages are interesting because they let you inject live control into the conversation without pretending it is either timeless identity or actual user speech.

That is their job.

If your provider supports them, use them for runtime policy and context updates. If it does not, keep the split internally anyway and compile the best approximation you can.

The mistake is not using the wrong role name. The mistake is collapsing four kinds of information into one immortal paragraph and hoping the model sorts it out.