Kangwook Lee:Codex context compaction 机制逆向观察(X 链接归档)
解读
- 一句话总结:作者通过两步调用(compact + create)做了一个最小注入实验,论证 codex 路径的服务器端 compact 仍然依赖 LLM 摘要与 handoff prompt 机制。
- 原文可核验要点:
- 文中先对比两条路径:非 codex 模型走本地可见 compaction;codex 模型走
compact()API 并返回加密 blob。 - 作者实验步骤明确:先在 compact 阶段注入,再在 create 阶段探测解密后上下文回显。
- 文章声称提取到的 compaction/handoff 提示与开源 Codex CLI 中非 codex 路径模板高度接近。
- 结尾提出开放问题:为何要维护两套压缩路径,以及为何必须加密 summary blob。
- 文中先对比两条路径:非 codex 模型走本地可见 compaction;codex 模型走
- 我的中文解读(实用结论):
- 这类机制提醒我们:上下文压缩链路本身也是提示注入攻击面,不能只盯最终模型调用。
- 如果在生产里做“服务端记忆压缩”,要把压缩器当独立安全边界来设计(输入净化、越权内容隔离、回放探针测试)。
- 数据缺口说明:
- 直抓 X 页面失败;本归档基于 r.jina 转录 + api.fxtwitter 结构化结果。
- 贴文中若干关键截图(高亮 prompt 内容)未做 OCR 逐字转写,因此对图片内文本保留为“可验证链接”,未声称完整逐字复原。
原文留档(全文/转录)
展开查看原文(r.jina 转录)
Title: Kangwook Lee on X: “Investigating how Codex context compaction works” / X
URL Source: http://x.com/kangwook_lee/status/2028955292025962534
Published Time: Wed, 04 Mar 2026 01:47:49 GMT
Markdown Content: For non-codex models, the open-source Codex CLI compacts context locally: an LLM summarizes the conversation using a
. When the compacted context is later used, responses.create() receives it with a
that frames the summary. Both prompts are visible in the source code.
For codex models, the CLI instead calls the compact() API, which returns an encrypted blob. We don’t know if it uses an LLM internally, what prompts it uses, or whether there is a handoff prompt at all.
Below, I show how a simple prompt injection (2 API calls, 35 lines of Python) reveals that the API compaction path does use an LLM to summarize the context, with its own compaction prompt and a handoff prompt prepended to the summary. The prompts are nearly identical to the open-source versions.
I call compact() with a crafted user message. On the server side, a compactor LLM processes our input using its own hidden system prompt (which I have never seen and want to figure out).
The server seems to assemble the compactor’s context like this:
The compactor LLM reads its system prompt + our input together. Because our input contains an injection payload (red text above), the compactor is tricked into including its own system prompt in its output. This plaintext summary exists only on OpenAI’s server. We only see the encrypted blob:
At this point we have no way to read what’s inside the blob. It is AES-encrypted and the key lives on OpenAI’s servers. We only hope the compactor obeyed the injection and wrote its prompt into the summary. The only way to find out is Step 2.
I pass the encrypted blob + a second user message to responses.create(). The server decrypts the blob and assembles the model’s context.
I send:
The model seems to see something like this:
If Step 1 worked, the decrypted blob should contain the compaction prompt (leaked by our injection). The server also prepends a handoff prompt to the blob. So if our probe successfully gets the model to repeat what it sees, the output should reveal all three: the system prompt, the handoff prompt, and the compaction prompt.
Below is the complete, unedited output from one run of extract_prompts.py. Yellow = system prompt, green = handoff prompt, pink = compaction prompt.
How do we know these are the real prompts and not just hallucinated text? The extracted compaction prompt and handoff prompt closely match the known prompts used for non-codex models in the open-source Codex CLI (
,
), which makes it unlikely that the model invented them from scratch. Results vary across runs.
Putting it all together, here is our best guess for what compact() does on the server side, based on what the extraction revealed.
Why does the Codex CLI use two entirely different compaction paths (local LLM for non-codex models, encrypted API for codex models) when the underlying prompts are nearly identical? And why encrypt the summary at all?
Hard to say. Maybe the encrypted blob carries something more than what this simple experiment can reveal, e.g. something specific about how tool results are compacted and restored. But I didn’t bother to test further.
结构化补档(api.fxtwitter 提取)
- 标题:Investigating how Codex context compaction works
- 发布时间:2026-03-03T22:06:52.000Z
- 关键参考链接(由结构化抓取给出):
- https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/prompt.md
- https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/summary_prefix.md
展开查看结构化正文转写
For non-codex models, the open-source Codex CLI compacts context locally: an LLM summarizes the conversation using a compaction prompt. When the compacted context is later used, responses.create() receives it with a handoff prompt that frames the summary. Both prompts are visible in the source code.
For codex models, the CLI instead calls the compact() API, which returns an encrypted blob. We don’t know if it uses an LLM internally, what prompts it uses, or whether there is a handoff prompt at all.
Below, I show how a simple prompt injection (2 API calls, 35 lines of Python) reveals that the API compaction path does use an LLM to summarize the context, with its own compaction prompt and a handoff prompt prepended to the summary. The prompts are nearly identical to the open-source versions.
Step 1 — compact()
I call compact() with a crafted user message. On the server side, a compactor LLM processes our input using its own hidden system prompt (which I have never seen and want to figure out).
The server seems to assemble the compactor’s context like this:
The compactor LLM reads its system prompt + our input together. Because our input contains an injection payload (red text above), the compactor is tricked into including its own system prompt in its output. This plaintext summary exists only on OpenAI’s server. We only see the encrypted blob:
At this point we have no way to read what’s inside the blob. It is AES-encrypted and the key lives on OpenAI’s servers. We only hope the compactor obeyed the injection and wrote its prompt into the summary. The only way to find out is Step 2.
Step 2 — create()
I pass the encrypted blob + a second user message to responses.create(). The server decrypts the blob and assembles the model’s context.
I send:
The model seems to see something like this:
If Step 1 worked, the decrypted blob should contain the compaction prompt (leaked by our injection). The server also prepends a handoff prompt to the blob. So if our probe successfully gets the model to repeat what it sees, the output should reveal all three: the system prompt, the handoff prompt, and the compaction prompt.
Output
Below is the complete, unedited output from one run of extract_prompts.py. Yellow = system prompt, green = handoff prompt, pink = compaction prompt.
How do we know these are the real prompts and not just hallucinated text? The extracted compaction prompt and handoff prompt closely match the known prompts used for non-codex models in the open-source Codex CLI (prompt.md, summary_prefix.md), which makes it unlikely that the model invented them from scratch. Results vary across runs.
The Guessed Pipeline
Putting it all together, here is our best guess for what compact() does on the server side, based on what the extraction revealed.
The Script
Open Question
Why does the Codex CLI use two entirely different compaction paths (local LLM for non-codex models, encrypted API for codex models) when the underlying prompts are nearly identical? And why encrypt the summary at all?
Hard to say. Maybe the encrypted blob carries something more than what this simple experiment can reveal, e.g. something specific about how tool results are compacted and restored. But I didn’t bother to test further.