Observing a Baseline Phase Transition in LLM Safety Behavior: Supplementary Evidence for Environmental Design — Paper 5-1 in the Buddys Architecture Series
Indexed indatacite
Abstract
Seven days after Paper 5 demonstrated that a 92-character value injection produces a sharp phase transition from action (A=93%) to restraint (C=100%) on a trilemma probe, we attempted replication across six models from three providers (Anthropic, OpenAI, Google). The replication failed—not because the effect disappeared, but because the baselines themselves had undergone the same phase transition. All five testable models shifted toward C without any system prompt: Claude Opus and Sonnet reached C=100%, GPT-4o and GPT-4o-mini reached C=100%, and Gemini 2.5 Flash shifted partially to C=13%. Claude Haiku 3.5 was retired from the API entirely. This industry-wide convergence observed within a one-week window…
Citation impact
5
total citations
- FWCI
- —
- Percentile
- —
- References
- 8
Too recent for citation history.
Authors
1Topics & keywords
Topics
Keywords
- Offset (computer science)
- Attractor
- Series (stratigraphy)
- Phase transition
- Baseline (sea)
- Robustness (evolution)
- Reboot
No related works found for this paper.